Presented by: https://jafrilibrary.org 


Llementary 


VIALS TIVO 
icdinit the World sth Edition 


RN 2 


Baailm 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


ELEMENTAR 
STATISTICS 


PICTURING THE WORLD 
Fifth Edition 


Ron Larson 


The Pennsylvania State University 
The Behrend College 


Betsy Farber 


Bucks County Community College 


Prentice Hall 
Boston Columbus Indianapolis New York San Francisco Upper Saddle River 
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto 
Delhi Mexico City Sao Paulo Sydney HongKong Seoul Singapore Taipei Tokyo 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


Editor-in-Chief: Deirdre Lynch 

Acquisitions Editor: Marianne Stepanian 

Senior Content Editor: Chere Bemelmans 

Editorial Assistant: Sonia Ashraf 

Senior Managing Editor: Karen Wernholm 

Associate Managing Editor: Tamela Ambush 
Supplements Production Coordinator: Katherine Roz 
Media Producer: Audra Walsh 

MathXL Project Supervisor: Bob Carroll 

TestGen QA Manager Assessment Content: Marty Wright 
Senior Marketing Manager: Alex Gay 

Marketing Assistant: Kathleen DeChavez 

Senior Author Support/Technology Specialist: Joe Vetere 
Rights and Permissions Advisor: Michael Joyce 
Image Manager: Rachel Youdelman 

Senior Manufacturing Buyer: Carol Melville 

Senior Media Buyer: Ginny Michaud 

Design Manager: Andrea Nix 

Senior Designer: Beth Paquin 

Text Design: Lisa Kuhn, Curio Press LLC 
Composition: Larson Texts, Inc. 

Illustrations: Larson Texts, Inc. 


Cover Design: Rokusek Design 
Cover Image: Alice/Getty Images 


For permission to use copyrighted material, grateful acknowledgment is made to the copyright 
holders on the page following the Index, which is hereby made part of this copyright page. 


Many of the designations used by manufacturers and sellers to distinguish their 
products are claimed as trademarks. Where those designations appear in this book, 
and Pearson Education was aware of a trademark claim, the designations have 
been printed in initial caps or all caps. 


Library of Congress Cataloging-in-Publication Data 
Larson, Ron, 1941- 
Elementary statistics : picturing the world / Ron Larson, Betsy Farber. -- Sth ed. 
p.cm. 
ISBN 978-0-321-69362-4 
1. Statistics--Textbooks. I. Farber, Elizabeth. II. Title. 
QA276.12.L373 2012 
519.5--de22 
2010000454 


Copyright © 2012, 2009, 2006, 2003 Pearson Education, Inc. 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, 
in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior 
written permission of the publisher. Printed in the United States of America. For information on obtaining 
permission for use of material in this work, please submit a written request to Pearson Education, Inc., Rights 
and Contracts Department, 501 Boylston Street, Suite 900, Boston, MA 02116, fax your request to 617-671-3447, 
or e-mail at http-://www.pearsoned.com/legal/permissions.htm. 


123456789 10—QGV—14 13 12 11 10 


Prentice Hall 
is an imprint of 


PEARSON ISBN 10: 0-321-69362-0 


www.pearsonhighered.com ISBN 13: 978-0-321-69362-4 


Presented by: https://jafrilibrary.org 


Ron Larson 
The Pennsylvania State 
University 
The Behrend College 


Betsy Farber 


Bucks County 
Community College 


Presented by: https://jafrilibrary.org 


ABOUT THE AUTHORS 


ABOUT THE AUTHORS 


Ron Larson received his Ph.D. in mathematics from the University of 
Colorado in 1970. At that time he accepted a position with Penn State 
University, and he currently holds the rank of professor of mathematics 
at the university. Larson is the lead author of more than two dozen 
mathematics textbooks that range from sixth grade through calculus 
levels. Many of his texts, such as the eighth edition of his calculus 
text, are leaders in their markets. Larson is also one of the pioneers in 
the use of multimedia and the Internet to enhance the learning of 
mathematics. He has authored multimedia programs, extending from 
the elementary school through calculus levels. Larson is a member of 
several professional groups and is a frequent speaker at national and 
regional mathematics meetings. 


Betsy Farber received her Bachelor’s degree in mathematics from Penn 
State University and her Master’s degree in mathematics from the College 
of New Jersey. Since 1976, she has been teaching all levels of mathematics 
at Bucks County Community College in Newtown, Pennsylvania, where 
she currently holds the rank of professor. She is particularly interested in 
developing new ways to make statistics relevant and interesting to her 
students and has been teaching statistics in many different modes—with 
the TI-83 Plus, with MINITAB, and by distance learning as well as in the 
traditional classroom. A member of the American Mathematical 
Association of Two-Year Colleges (AMATYC), she is an author of 
The Student Edition to MINITAB and A Guide to MINITAB. She served 
as consulting editor for Statistics, A First Course and has written 
computer tutorials for the CD-ROM correlating to the texts in the 
Streeter Series in mathematics. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


Preface x How To Study Statistics XV 
N [ FE N [ Supplements XIl Index of Applications XVI 
Acknowledgments X/V 


PART ONE DESCRIPTIVE STATISTICS 


INTRODUCTION TO STATISTICS 


Where You’ve Been and Where You’re Going 1 
1.1. An Overview of Statistics 2 
1.2 Data Classification 9 
= Case Study: Rating Television Shows in the United States 15 
1.3. Data Collection and Experimental Design 16 
m Activity: Random Numbers 26 
= Uses and Abuses 27 
Chapter Summary 28 
Review Exercises 29 
Chapter Quiz 31 
= Real Statistics—Real Decisions—Putting It All Together 32 
# History of Statistics—Timeline 33 
= Technology: Using Technology in Statistics 34 
DESCRIPTIVE STATISTICS 36 
Where You’ve Been and Where You’re Going 37 
2.1. Frequency Distributions and Their Graphs 38 
2.2 | More Graphs and Displays 53 
2.3. Measures of Central Tendency 65 
= Activity: Mean Versus Median 79 
2.4 Measures of Variation 80 
= Activity: Standard Deviation 98 
= Case Study: Earnings of Athletes 99 
2.5. Measures of Position 100 
= Uses and Abuses 113 
Chapter Summary 114 
Review Exercises 115 
Chapter Quiz 119 
# Real Statistics-Real Decisions—Putting It All Together 120 
= Technology: Monthly Milk Production 121 
# Using Technology to Determine Descriptive Statistics 122 
Cumulative Review: Chapters 1-2 124 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


CONTENTS 


V 


PART TWO PROBABILITY AND PROBABILITY DISTRIBUTIONS 


3.1 


3.2 
3.3 


3.4 


4.1 
4.2 


4.3 


PROBABILITY 


Where You’ve Been and Where You're Going 


Basic Concepts of Probability and Counting 

= Activity: Simulating the Stock Market 

Conditional Probability and the Multiplication Rule 
The Addition Rule 

= Activity: Simulating the Probability of Rolling a 3 or 4 
= Case Study: United States Congress 

Additional Topics in Probability and Counting 

= Uses and Abuses 

Chapter Summary 

Review Exercises 

Chapter Quiz 

# Real Statistics-Real Decisions—Putting It All Together 


126 


127 
128 
144 
145 
156 
166 
167 
168 
179 
180 
181 
185 
186 


= Technology: Simulation: Composing Mozart Variations with Dice 187 


DISCRETE PROBABILITY DISTRIBUTIONS 


Where You've Been and Where You're Going 


Probability Distributions 

Binomial Distributions 

= Activity: Binomial Distribution 

= Case Study: Binomial Distribution of Airplane Accidents 
More Discrete Probability Distributions 

= Uses and Abuses 

Chapter Summary 

Review Exercises 

Chapter Quiz 

= Real Statistics-Real Decisions—Putting It All Together 
= Technology: Using Poisson Distributions as Queuing Models 


Presented by: https://jafrilibrary.org 


188 
189 
190 
202 
216 
217 
218 
225 
226 
227 
231 
232 
233 


Presented by: https://jafrilibrary.org 


Vi CONTENTS 
NORMAL PROBABILITY DISTRIBUTIONS 234 
Where You’ve Been and Where You’re Going 235 
5.1 Introduction to Normal Distributions and 236 
the Standard Normal Distribution 
5.2. Normal Distributions: Finding Probabilities 249 
5.3. Normal Distributions: Finding Values 257 
m Case Study: Birth Weights in America 265 
5.4 Sampling Distributions and the Central Limit Theorem 266 
= Activity: Sampling Distributions 280 
5.5 Normal Approximations to Binomial Distributions 281 
m Uses and Abuses 291 
Chapter Summary 292 
Review Exercises 293 
Chapter Quiz 297 
= Real Statistics-Real Decisions—Putting It All Together 298 
= Technology: Age Distribution in the United States 299 
Cumulative Review: Chapters 3-5 300 
PART THREE STATISTICAL INFERENCE 
CONFIDENCE INTERVALS 302 
Where You’ve Been and Where You're Going 303 
6.1. Confidence Intervals for the Mean (Large Samples) 304 
= Case Study: Marathon Training 317 
6.2. Confidence Intervals for the Mean (Small Samples) 318 
= Activity: Confidence Intervals for a Mean 326 
6.3. Confidence Intervals for Population Proportions 327 
= Activity: Confidence Intervals for a Proportion 336 
6.4 Confidence Intervals for Variance and Standard Deviation 337 
= Uses and Abuses 344 
Chapter Summary 345 
Review Exercises 346 
Chapter Quiz 349 
= Real Statistics—Real Decisions—Putting It All Together 350 
= Technology: Most Admired Polls 351 
= Using Technology to Construct Confidence Intervals 352 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


CONTENTS Vil 

HYPOTHESIS TESTING WITH ONE SAMPLE 354 
Where You’ve Been and Where You're Going 355 
7.1. Introduction to Hypothesis Testing 356 
7.2. Hypothesis Testing for the Mean (Large Samples) 371 
= Case Study: Human Body Temperature: What's Normal? 386 

7.3. Hypothesis Testing for the Mean (Small Samples) 387 
= Activity: Hypothesis Tests for a Mean 397 

7.4 Hypothesis Testing for Proportions 398 
= Activity: Hypothesis Tests for a Proportion 403 

7.5. Hypothesis Testing for Variance and Standard Deviation 404 
= Uses and Abuses 413 

A Summary of Hypothesis Testing 414 
Chapter Summary 416 
Review Exercises 417 
Chapter Quiz 421 

= Real Statistics-Real Decisions—Putting It All Together 422 

= Technology: The Case of the Vanishing Women 423 

= Using Technology to Perform Hypothesis Tests 424 


HYPOTHESIS TESTING WITH TWO SAMPLES 426 


Where You’ve Been and Where You're Going 427 

8.1. Testing the Difference Between Means 428 
(Large Independent Samples) 

m Case Study: Readability of Patient Education Materials 441 

8.2 Testing the Difference Between Means 442 

(Small Independent Samples) 
8.3 Testing the Difference Between Means 451 
(Dependent Samples) 

8.4 Testing the Difference Between Proportions 461 
= Uses and Abuses 469 
Chapter Summary 470 
Review Exercises 471 
Chapter Quiz 475 
= Real Statistics-Real Decisions—Putting It All Together 476 
= Technology: Tails over Heads 477 
# Using Technology to Perform Two-Sample Hypothesis Tests 478 
Cumulative Review: Chapters 6-8 480 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


Vill CONTENTS 


PART FOUR MORE STATISTICAL INFERENCE 


9.1 


9.2 


CORRELATION AND REGRESSION 


Where You’ve Been and Where You’re Going 


Correlation 

= Activity: Correlation by Eye 

Linear Regression 

= Activity: Regression by Eye 

= Case Study: Correlation of Body Measurements 
Measures of Regression and Prediction Intervals 
Multiple Regression 

= Uses and Abuses 

Chapter Summary 

Review Exercises 

Chapter Quiz 

= Real Statistics-—Real Decisions—Putting It All Together 
= Technology: Nutrients in Breakfast Cereals 


CHI-SQUARE TESTS AND THE F-DISTRIBUTION 


Where You’ve Been and Where You're Going 


10.1 
10.2 


10.3 
10.4 


Goodness-of-Fit Test 
Independence 

= Case Study: Fast Food Survey 
Comparing Two Variances 
Analysis of Variance 

m Uses and Abuses 

Chapter Summary 

Review Exercises 

Chapter Quiz 

# Real Statistics-Real Decisions—Putting It All Together 
= Technology: Teacher Salaries 


Presented by: https://jafrilibrary.org 


482 
483 
484 
500 
501 
511 
512 
513 
524 
529 
530 
531 
535 
536 
537 


538 
539 
540 
551 
564 
565 
574 
587 
588 
589 
593 
594 
595 


Presented by: https://jafrilibrary.org 


CONTENTS 


NONPARAMETRIC TESTS 
Where You’ve Been and Where You're Going 
11.1. The Sign Test 
11.2 The Wilcoxon Tests 
= Case Study: College Ranks 
11.3. The Kruskal-Wallis Test 
11.4 Rank Correlation 
11.5 The Runs Test 
= Uses and Abuses 
Chapter Summary 
Review Exercises 
Chapter Quiz 
# Real Statistics-Real Decisions—Putting It All Together 
= Technology: U.S. Income and Economic Research 
Cumulative Review: Chapters 9-11 


APPENDICES 


APPENDIX A ALTERNATIVE PRESENTATION OF THE STANDARD 
NORMAL DISTRIBUTION 


Standard Normal Distribution Table (0-to-z) 
Alternative Presentation of the Standard Normal Distribution 


APPENDIX B_ TABLES 

TABLE 1. =Random Numbers 

TABLE 2 ~~ Binomial Distribution 

TABLE 3 Poisson Distribution 

TABLE 4 = Standard Normal Distribution 

TABLE 5 _ t-Distribution 

TABLE 6 ~~ Chi-Square Distribution 

TABLE 7 _ F-Distribution 

TABLE 8 Critical Values for the Sign Test 

TABLE 9 Critical Values for the Wilcoxon Signed-Rank Test 
TABLE 10 Critical Values for the Spearman Rank Correlation 
TABLE 11 Critical Values for the Pearson Correlation Coefficient 
TABLE 12 Critical Values for the Number of Runs 


APPENDIX C NORMAL PROBABILITY PLOTS AND THEIR GRAPHS 


Answers to the Try It Yourself Exercises 
Answers to the Odd-Numbered Exercises 
Index 

Photo Credits 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


PREFACE 


Welcome to Elementary Statistics: Picturing the World, 
Fifth Edition. You will find that this textbook is written 
with a balance of rigor and simplicity. It combines step- 
by-step instruction, real-life examples and exercises, 
carefully developed features, and technology that makes 
statistics accessible to all. 


We are grateful for the overwhelming acceptance of the 
first four editions. It is gratifying to know that our vision 
of combining theory, pedagogy, and design to exemplify 
how statistics is used to picture and describe the world 
has helped students learn about statistics and make 
informed decisions. 


WHAT'S NEW IN THIS EDITION 


The goal of the Fifth Edition was a thorough update of 
the key features, examples, and exercises: 


Examples This edition includes more than 210 exam- 
ples, approximately 50% of which are new or revised. 


Exercises Approximately 50% of the more than 2100 
exercises are new or revised. We’ve also added 75 concep- 
tual and critical thinking exercises throughout the text. 


StatCrunch® Examples New to this edition are 
more than 50 StatCrunch Reports. These interactive 
reports, called out in the book with the @{®@ icon, 
provide step-by-step instructions for how to use the 
online statistical software StatCrunch to solve the exam- 
ples. Note: Accessing these reports requires a MyStatLab 
or StatCrunch account. 


StatCrunch Exercises New to this edition are more 
than 80 exercises that instruct students to solve the exer- 
cise using StatCrunch. This allows students to practice 
the software skills learned in the StatCrunch Examples. 
Note: Solving the exercises using StatCrunch requires a 
MyStatLab or StatCrunch account. 


Extensive Feature Updates Approximately 50% 
of the following key features have been replaced, making 
this edition fresh and relevant to today’s students: 


e Chapter Openers 
e Case Studies 


e Putting It All Together: Real Statistics—Real 
Decisions 


Revised Content The following sections have been 
changed: 


e Section 2.2, More Graphs and Displays, now 
defines misleading graphs. 


e Section 2.5, Measures of Position, now defines the 
modified boxplot. 


e Section 9.1, Correlation, now defines perfect pos- 
itive linear correlation and perfect negative linear 
correlation. 


FEATURES OF THE FIFTH EDITION 


Guiding Student Learning 


Where You've Been and Where You're Going 
Each chapter begins with a two-page visual description of a 
real-life problem. Where You’ve Been shows students how 
the chapter fits into the bigger picture of statistics by 
connecting it to topics learned in earlier chapters. Where 
You’re Going gives students an overview of the chapter, 
exploring concepts in the context of real-world settings. 


What You Should Learn Each section is organized 
by learning objectives, presented in everyday language in 
What You Should Learn. The same objectives are then 
used as subsection titles throughout the section. 


Definitions and Formulas are clearly presented in 
easy-to-locate boxes. They are often followed by 
Guidelines, which explain In Words and In Symbols 
how to apply the formula or understand the definition. 


Margin Features help reinforce understanding: 


e Study Tips show how to read a table, use technol- 
ogy, or interpret a result or a graph. Round-off 
Rules guide the student during calculations. 


e Insights help drive home an important interpre- 
tation or connect different concepts. 


e Picturing the World Each section contains a real- 
life “mini case study” called Picturing the World 
illustrating important concepts in the section. 
Each feature concludes with a question and can 
be used for general class discussion or group 
work. The answers to these questions are included 
in the Annotated Instructor's Edition. 


Examples and Exercises 


Examples Every concept in the text is clearly illus- 
trated with one or more step-by-step examples. Most 
examples have an interpretation step that shows the stu- 
dent how the solution may be interpreted within the 
real-life context of the example and promotes critical 
thinking and writing skills. Each example, which is num- 
bered and titled for easy reference, is followed by a sim- 
ilar exercise called Try It Yourself so students can 
immediately practice the skill learned. The answers to 
these exercises are given in the back of the book, and the 
worked-out solutions are given in the Student's Solutions 
Manual. The Videos on DVD show clips of an instructor 
working out each Try It Yourself exercise. 
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StatCrunch Examples New to this edition are more 
than 50 StatCrunch Reports. These interactive reports, 
called out in the book with the €® icon, provide step-by- 
step instructions for how to use the online statistical 
software StatCrunch to solve the examples. Go to www.stat- 
crunch.com, choose Explore W Groups, and search for 
“Larson Elementary Statistics 5/e” to access the StatCrunch 
Reports. Note: Accessing these reports requires a 
MyStatLab or StatCrunch account. 


Technology Examples Many sections contain a worked 
example that shows how technology can be used to calculate 
formulas, perform tests, or display data. Screen displays from 
MINITAB®, Excel®, and the TI-83/84 Plus graphing 
calculator are given. Additional screen displays are presented 
at the ends of selected chapters, and detailed instructions are 
given in separate technology manuals available with the book. 


Exercises The Fifth Edition includes more than 2100 exer- 
cises, giving students practice in performing calculations, 
making decisions, providing explanations, and applying 
results to a real-life setting. Approximately 50% of these 
exercises are new or revised. The exercises at the end of 
each section are divided into three parts: 


e Building Basic Skills and Vocabulary are short 
answer, true or false, and vocabulary exercises careful- 
ly written to nurture student understanding. 


e Using and Interpreting Concepts are skill or word 
problems that move from basic skill development to 
more challenging and interpretive problems. 


e Extending Concepts go beyond the material pre- 
sented in the section. They tend to be more challeng- 
ing and are not required as prerequisites for subse- 
quent sections. 


For the sections that contain StatCrunch examples, there are 
corresponding StatCrunch exercises that direct students to use 
StatCrunch to solve the exercises. Note: Using StatCrunch 
requires a MyStatLab or StatCrunch account. 


Technology Answers Answers in the back of the book 
are found using tables. Answers found using technology are 
also included when there are discrepancies due to rounding. 


Review and Assessment 


Chapter Summary Each chapter concludes with a 
Chapter Summary that answers the question What did you 
learn? The objectives listed are correlated to Examples in 
the section as well as to the Review Exercises. 


Chapter Review Exercises A set of Review Exercises 
follows each Chapter Summary. The order of the exercises 
follows the chapter organization. Answers to all odd-num- 
bered exercises are given in the back of the book. 


PREFACE Xl 


Chapter Quizzes Each chapter ends with a Chapter 
Quiz. The answers to all quiz questions are provided in the 
back of the book. For additional help, see the step-by-step 
video solutions on the companion DVD-ROM. 


Cumulative Review A Cumulative Review at the end 
of Chapters 2,5, 8, and 11 concludes each part of the text. 
Exercises in the Cumulative Review are in random order 
and may incorporate multiple ideas. Answers to all odd- 
numbered exercises are given in the back of the book. 


Statistics in the Real World 


Uses and Abuses: Statistics in the Real World 
Each chapter features a discussion on how statistical tech- 
niques should be used, while cautioning students about 
common abuses. The discussion includes ethics, where 
appropriate. Exercises help students apply their knowledge. 


Applet Activities Selected sections contain activities 
that encourage interactive investigation of concepts in the 
lesson with exercises that ask students to draw conclusions. 
The accompanying applets are contained on the DVD that 
accompanies new copies of the text. 


Chapter Case Study Each chapter has a full-page Case 
Study featuring actual data from a real-world context and 
questions that illustrate the important concepts of the chapter. 


Putting It All Together: Real Statistics—Real 
Decisions This feature encourages students to think crit- 
ically and make informed decisions about real-world data. 
Exercises guide students from interpretation to drawing of 
conclusions. 


Chapter Technology Project Each chapter has a 
Technology project using MINITAB, Excel, and the TI- 
83/84 Plus that gives students insight into how technology is 
used to handle large data sets or real-life questions. 


CONTINUED STRONG PEDAGOGY 
FROM THE FOURTH EDITION 


Versatile Course Coverage The table of contents was 
developed to give instructors many options. For instance, the 
Extending Concepts exercises, applet activities, Real 
Statistics-Real Decisions, and Uses and Abuses provide suf- 
ficient content for the text to be used in a two-semester 
course. More commonly, we expect the text to be used in a 
three-credit semester course or a four-credit semester course 
that includes a lab component. In such cases, instructors will 
have to pare down the text’s 46 sections. 


Graphical Approach As with most introductory statis- 
tics texts, we begin the descriptive statistics chapter 
(Chapter 2) with a survey of different ways to display data 
graphically. A difference between this text and many others 
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is that we continue to incorporate the graphical display of 
data throughout the text. For example, see the use of stem- 
and-leaf plots to display data on pages 385 and 386. This 
emphasis on graphical displays is beneficial to all students, 
especially those utilizing visual learning strategies. 


Balanced Approach The text strikes a balance among 
computation, decision making, and conceptual understand- 
ing. We have provided many Examples, Exercises, and Try It 
Yourself exercises that go beyond mere computation. 


Variety of Real-Life Applications We have chosen 
real-life applications that are representative of the majors of 
students taking introductory statistics courses. We want sta- 
tistics to come alive and appear relevant to students so they 
understand the importance of and rationale for studying 
statistics. We wanted the applications to be authentic—but 
they also need to be accessible. See the Index of 
Applications on page XVI. 


Data Sets and Source Lines The data sets in the book 
were chosen for interest, variety, and their ability to illus- 
trate concepts. Most of the 240-plus data sets contain real 
data with source lines. The remaining data sets contain 
simulated data that are representative of real-life situations. 
All data sets containing 20 or more entries are available in 
a variety of formats; they are available electronically on the 
DVD and Internet. In the exercise sets, the data sets that are 
available electronically are indicated by the icon (*,. 


Flexible Technology Although most formulas in the 
book are illustrated with “hand” calculations, we assume 
that most students have access to some form of technology 
tool, such as MINITAB, Excel, the TI-83 Plus, or the TI-84 
Plus. Because the use of technology varies widely, we have 
made the text flexible. It can be used in courses with no 
more technology than a scientific calculator—or it can be 
used in courses that require sophisticated technology tools. 
Whatever your use of technology, we are sure you agree 
with us that the goal of the course is not computation. 
Rather, it is to help students gain an understanding of the 
basic concepts and uses of statistics. 


Prerequisites Algebraic manipulations are kept to a 
minimum—often we display informal versions of formulas 
using words in place of or in addition to variables. 


SUPPLEMENTS 


STUDENT RESOURCES 


Student Solutions Manual Includes complete worked-out solu- 
tions to all of the Try It Yourself exercises, the odd-numbered 
exercises, and all of the Chapter Quiz exercises. 

(ISBN-13: 978-0-321-69373-0; ISBN-10: 0-321-69373-6) 


Choice of Tables Our experience has shown that students 
find a cumulative density function (CDF) table easier to use 
than a “0-to-z” table. Using the CDF table to find the area 
under a normal curve is a topic of Section 5.1 on pages 
239-243. Because we realize that some teachers prefer to use 
the “0-to-z” table, we have provided an alternative presenta- 
tion of this topic using the “0-to-z” table in Appendix A. 


Page Layout Statistics is more accessible when it is care- 
fully formatted on each page with a consistent open layout. 
This text is the first college-level statistics book to be writ- 
ten so that its features are not split from one page to the 
next. Although this process requires extra planning, the 
result is a presentation that is clean and clear. 


MEETING THE STANDARDS 


MAA, AMATYC, NCTM Standards This text answers 
the call for a student-friendly text that emphasizes the uses 
of statistics. Our job as introductory instructors is not to 
produce statisticians but to produce informed consumers of 
statistical reports. For this reason, we have included exercis- 
es that require students to interpret results, provide written 
explanations, find patterns, and make decisions. 


GAISE Recommendations Funded by the American 
Statistical Association, the Guidelines for Assessment and 
Instruction in Statistics Education (GAISE) Project devel- 
oped six recommendations for teaching introductory statis- 
tics in a college course. These recommendations are: 


e Emphasize statistical literacy and develop statistical 
thinking. 
e Use real data. 


e Stress conceptual understanding rather than mere 
knowledge of procedures. 


e Foster active learning in the classroom. 


e Use technology for developing conceptual understand- 
ing and analyzing data. 


e Use assessments to improve and evaluate student learning. 


The examples, exercises, and features in this text embrace all of 
these recommendations. 


Videos on DVD-ROM A comprehensive set of videos tied to the 
textbook, containing short video clips of an instructor working 
every Try It Yourself exercise. New to this edition are section 
lecture videos. 

(ISBN-13: 978-0-321-69374-7; ISBN-10: 0-321-69374-4) 
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A Companion DVD-ROM is bound in new copies of Elementary 
Statistics: Picturing the World. The DVD holds a number of 
supporting materials, including: 


¢ Chapter Quiz Prep: video solutions to Chapter Quiz ques- 
tions in the text, with English and Spanish captions 

e Data Sets: selected data sets from the text, available in 
Excel, MINITAB (v.14), TI-83 / TI-84 and txt (tab delimited) 

e Applets: 15 applets by Webster West 

¢ DDXL: an Excel add-in 


Graphing Calculator Manual Tutorial instruction and worked out 
examples for the TI-83/84 Plus graphing calculator. 
(ISBN-13: 978-0-321-69379-2; ISBN-10: 0-321-69379-5) 


Excel Manual Tutorial instruction and worked-out examples for 
Excel. (ISBN-13: 978-0-321-69380-8; ISBN-10: 0-321-69380-9) 
Minitab Manual Tutorial instruction and worked-out examples for 
Minitab. (ISBN-13: 978-0-321-69377-8; ISBN-10: 0-321-69377-9) 


Study Cards for the following statistical software products are 
available: Minitab, Excel, SPSS, JMP, R, StatCrunch, and the TI- 
83/84 Plus graphing calculator. 


INSTRUCTOR RESOURCES 


Annotated Instructor's Edition Includes suggested activities, addi- 
tional ways to present material, common pitfalls, alternative for- 
mats or approaches, and other helpful teaching tips. All answers to 
the section and review exercises are provided with short answers 
appearing in the margin next to the exercise. 

(ISBN-13: 978-0-321-69365-5; ISBN-10: 0-321-69365-5) 

Instructor Solutions Manual Includes complete solutions to all of 
the exercises, Try It Yourself exercises, Case Studies, Technology 
pages, Uses and Abuses exercises, and Real Statistics-Real 
Decisions exercises. 

(ISBN-13: 978-0-321-69366-2; ISBN-10: 0-321-69366-3) 

TestGen® (www.pearsoned.com/testgen) Enables instructors to 
build, edit, and print, and administer tests using a computerized 
bank of questions developed to cover all the objectives of the text. 
TestGen is algorithmically based, allowing instructors to create 
multiple but equivalent versions of the same question or test with 
the click of a button. Instructors can also modify test bank ques- 
tions or add new questions. The software and testbank are avail- 
able for download from Pearson Education’s online catalog 
(www.pearsonhighered.com/irc). 


Online Test Bank A test bank derived from TestGen® available 
for download at www.pearsonhighered.com/irc. 


PowerPoint Lecture Slides Fully editable and printable slides that 
follow the textbook. Use during lecture or post to a website in an 
online course. Most slides include notes offering suggestions for 
how the material may effectively be presented in class. These 
slides are available within MyStatLab or at www.pearsonhigh- 
ered.com/irc. 


Active Learning Questions Prepared in PowerPoint®, these 
questions are intended for use with classroom response systems. 
Several multiple-choice questions are available for each chapter 
of the book, allowing instructors to quickly assess mastery of 
material in class. The Active Learning Questions are available 
to download from within MyStatLab or at 
www.pearsonhighered.com/irc. 
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TECHNOLOGY SUPPLEMENTS 


MyStatLab™ Online Course (access code required) 


MyStatLab is a series of text-specific, easily customizable online 
courses for Pearson Education's textbooks in statistics. For stu- 
dents, MyStatLab™ provides students with a personalized interac- 
tive learning environment that adapts to each student's learning 
style and gives them immediate feedback and help. Because 
MyStatLab is delivered over the Internet, students can learn at 
their own pace and work whenever they want. MyStatLab provides 
instructors with a rich and flexible set of text-specific resources, 
including course management tools, to support online, hybrid, or 
traditional courses. MyStatLab is available to qualified adopters 
and includes access to StatCrunch. For more information, visit 
www.mystatlab.com or contact your Pearson representative. 


MathXL® for Statistics Online Course 
(access code required) 
MathXL® for Statistics is a powerful online homework, tutorial, 
and assessment system that accompanies Pearson textbooks in 
statistics. With MathXL for Statistics, instructors can: 
e Create, edit, and assign online homework and tests using 
algorithmically generated exercises correlated at the objec- 
tive level to the textbook. 


e Create and assign their own online exercises and import 
TestGen tests for added flexibility. 

e Maintain records of all student work, tracked in MathXL’s 
online gradebook. 


With MathXL for Statistics, students can: 


e Take chapter tests in MathXL and receive personalized 
study plans and/or personalized homework 
assignments based on their test results. 

e Use the study plan and/or the homework to link directly to 
tutorial exercises for the objectives they need to study. 

e Students can also access supplemental animations and 
video clips directly from selected exercises. 


MathXL for Statistics is available to qualified adopters. For more 
information, visit our website at www.mathxl.com, or contact your 
Pearson representative. 


StatCrunch® 


StatCrunch® is an online statistical software website that allows 
users to perform complex analyses, share data sets, and generate 
compelling reports of their data. Developed by programmers and 
statisticians, StatCrunch currently has more than twelve thousand 
data sets available for students to analyze, covering almost any 
topic of interest. Interactive graphics are embedded to help users 
understand statistical concepts and are available for export to 
enrich reports with visual representations of data. Additional 
features include: 
e A full range of numerical and graphical methods that allow 
users to analyze and gain insights from any data set. 
e Flexible upload options that allow users to work with their 
txt or Excel® files, both online and offline. 
e Reporting options that help users create a wide variety of 
visually appealing representations of their data. 
StatCrunch is available to qualified adopters. For more informa- 
tion, visit our website at www.statcrunch.com, 
or contact your Pearson representative. 
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STUDY STRATEGIES 


Congratulations! You are about to begin your study of 
Statistics. As you progress through the course, you 
should discover how to use statistics in your everyday 
life and in your career. The prerequisites for this course 
are two years of algebra, an open mind, and a willing- 
ness to study. When you are studying statistics, the mate- 
rial you learn each day builds on material you learned 
previously. There are no shortcuts—you must keep up 
with your studies every day. Before you begin, read 
through the following hints that will help you succeed. 


Making a Plan Make your own course plan right now! A good 
rule of thumb is to study at least two hours for every hour in class. 
After your first major exam, you will know if your efforts were suf- 
ficient. If you did not get the grade you wanted, then you should 
increase your study time, improve your study efficiency, or both. 


Preparing for Class Before every class, review your notes from 
the previous class and read the portion of the text that is to be cov- 
ered. Pay special attention to the definitions and rules that are 
highlighted. Read the examples and work through the Try It 
Yourself exercises that accompany each example. These steps 
take self-discipline, but they will pay off because you will bene- 
fit much more from your instructor’s presentation. 


Attending Class Attend every class. Arrive on time with your 
text, materials for taking notes, and your calculator. If you must 
miss a class, get the notes from another student, go to a tutor or 
your instructor for help, or view the appropriate Video on DVD. 
Try to learn the material that was covered in the class you missed 
before attending the next class. 


Participating in Class When reading the text before class, review- 
ing your notes from a previous class, or working on your homework, 
write down any questions you have about the material. Ask your 
instructor these questions during class. Doing so will help you (and 
others in your class) understand the material better. 


Taking Notes 
During class, be : 
sure to take notes 
on definitions, 
examples, concepts, ° 
and rules. Focus on 
the instructor’s cues = 
to identify impor- 
tant material. Then, 
as soon after class 
as possible, review your notes and add any explanations that will 
help to make your notes more understandable to you. 


Draw a vertical 
line on your 
note paper. 


After class, reread 

your notes and write 
comments, questions, 
or explanations here. 


Doing the Homework Learning statistics is like learning to 
play the piano or to play basketball. You cannot develop skills just 
by watching someone do it; you must do it yourself. The best time 
to do your homework is right after class, when the concepts are 
still fresh in your mind. Doing homework at this time increases 
your chances of retaining the information in long-term memory. 


Finding a Study Partner When you get stuck on a problem, you 
may find that it helps to work with a partner. Even if you feel you 
are giving more help than you are getting, you will find that teach- 
ing others is an excellent way to learn. 


Keeping Up with the Work Don’t let yourself fall behind in 
this course. If you are having trouble, seek help immediately—from 
your instructor, a statistics tutor, your study partner, or additional 
study aids such as the Chapter Quiz Prep videos on DVD-ROM 
and the Try It Yourself video clips on the videos on DVD-ROM. 
Remember: If you have trouble with one section of your statistics 
text, there’s a good chance that you will have trouble with later 
sections unless you take steps to improve your understanding. 


Getting Stuck Every statistics student has had this experience: 
You work a problem and cannot solve it, or the answer you get 
does not agree with the one given in the text. When this happens, 
consider asking for help or taking a break to clear your thoughts. 
You might even want to sleep on it, or rework the problem, or 
reread the section in the text. Avoid getting frustrated or spending 
too much time on a single problem. 


Preparing for Tests Cramming for a statistics test seldom works. 
If you keep up with the work and follow the suggestions given here, 
you should be almost ready for the test. To prepare for the chapter 
test, review the Chapter Summary and work the Review Exercises 
and the Cumulative Review Exercises. Then set aside some time to 
take the sample Chapter Quiz. Analyze the results of your Chapter 
Quiz to locate and correct test-taking errors. 


Taking a Test Most instructors do not recommend studying 
right up to the minute the test begins. Doing so tends to make 
people anxious. The best cure for test-taking anxiety is to pre- 
pare well in advance. Once the test begins, read the directions 
carefully and work at a reasonable pace. (You might want to 
read the entire test first, then work the problems in the order in 
which you feel most comfortable.) Don’t rush! People who hurry 
tend to make careless errors. If you finish early, take a few 
moments to clear your thoughts and then go over your work. 


Learning from Mistakes After your test is returned to you, go 
over any errors you might have made. Doing so will help you 
avoid repeating some systematic or conceptual errors. Don’t dis- 
miss any error as just a “dumb mistake.” Take advantage of any 
mistakes by hunting for ways to improve your test-taking skills. 
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INDEX OF APPLICATIONS 


Biology and Life 
Sciences 


Air pollution, 31, 471 
Air quality, 115 
Alligator, 125 
Atlantic croaker fish, 49 
Beagle, 48, 253 
Box turtle, 235 
Brown trout, 220 
Cats, 182, 259, 402, 447 
Dogs, 142, 153, 182, 198, 200, 259, 
402, 447 
Elephants, 450, 527 
Endangered species, 590 
Environmentally friendly 
product, 287 
Fish, 13, 526 
Fisher’s Iris data set, 58 
Florida panther, 339 
Fruit flies, 110 
Green turtle migration, 294-295 
House flies, 62, 276 
Kitti’s hog-nosed bat, 294, 296 
Koalas, 30 
Oats, 629 
Ostrich, 481 
Pets, 94, 215 
Plants, 215 
Rabbits, 220 
Salmon, 135, 147 
Sharks, 230 
Snapdragon flowers, 142 
Soil, 175, 570 
Soybeans, 24 
Swans, 369 
Threatened species, 590 
Trees, 13, 48, 170, 270, 368, 522, 
527 
Vertebrate groups, 590 
Veterinarian, 447 
Waste, 324, 389, 394 
Water, 175, 385 
conductivity, 391 
consumption, 95 
hardness, 343 
pH level, 391 
Wheat, 591, 629 


Business 


Advertisements, 228, 400, 585 

Advertising and sales, 516 

Bankruptcies, 223 

Beverage company, 143 

Board of directors, 169 

Book sales, nonfiction, 14 

Bookbinding defects, 154 

Chief financial officers, survey of, 
212 

Clothing store purchases, 212 

Consumer ratings, 459 

Defective parts, 162, 176, 181, 
185, 223 

Executives, 111, 184 

Fortune 500 companies, 29, 191 

Free samples, 402 

Inventory shrinkage, 57 


Manufacturer 
claims, 246 
earnings, 109 
Product assembly, 175 
Quality control, 7, 31, 34, 35, 129, 
198 
Sales, 2, 50, 64, 119, 158, 192, 194, 
195, 222, 375, 520, 521, 528, 
578 
Salesperson, 76, 107 
Shipping errors, 368 
Small business 
owners, 214 
websites, 208 
Telemarketing, 190 
Wal-Mart shareholder’s equity, 
528 
Warehouses, 154, 177 
Website costs, 343 


Combinatorics 


Answer guessing, 154,212 
Area code, 176 

Letters, 172, 175 

License plates, 131, 176, 181 
Password, 174, 176 

Security code, 169, 184, 185 


Computers 

Computer, 7, 8, 201, 209, 253-255, 
332, 368 

Computer software engineer 
earnings, 343 

Disk drive, 586 

Internet, 31, 70, 152, 183, 229, 
287, 288, 369, 399, 429, 467, 
497, 498, 506, 560 

Microchips, 224 

Monitor, 273, 445 

Mozilla® Firefox®, 286 

Operating system, 7 

Printers, 96 

Security, 297 

Social networking sites, 61, 197, 
203, 304, 306-308, 310, 605 

Typing speed, 74 

Videos, online, 349 

Website, visitors per day, 600 

Windows® Internet Explorer®, 
286 


Demographics 


Age, 6, 25, 29, 31, 60, 76, 134, 157, 
161, 496, 544, 546, 558 

Birth weights in America, 265 

Bride’s age, 90, 608 

Cars per household, 94 

Children per household, 88 

City rent, 296 

Drive to work, 197 

Ear wiggling, 153 

Education, 593 

Employee, 21, 134, 136, 141, 176, 
177, 179, 181, 184, 198, 276, 
524-526 

Eye color, 13, 150, 157 


Groom’s age, 608 
Height, 6, 13, 486, 496, 649 
of men, 77, 86, 110, 253, 263, 
277, 507, 515 
of women, 48, 86, 253, 263, 277, 
401 
Home, 7 
Household, 200, 300, 419, 473 
Left-handed, 164 
Marriage, 5, 29 
Most admired polls, 351 
Moving out, 468 
New car, 130 
New home prices, 119 
Population 
Alaska, 87 
Brazil, 96 
cities, fastest growing, 1 
cities, largest numerical 
increase, 1 
Florida, 87 
US., 9, 95 
West Ridge County, 20-22 
Retirement age, 51 
Shoe size, 49, 507, 515 
US. age distribution, 163, 181, 
299 
US. unemployment rate, 116 
Weight of newborns, 13, 238, 480, 
A6 
Zip codes, 29 


Earth Science 


Alternative energy, 333, 349 
Carbon footprint, 300 
Clear days, May, San Francisco, 
CA, 210 
Climate conditions, 189 
Cloudy days, June, Pittsburgh, 
PA, 210 
Cyanide levels, 350 
Earthquakes, 260, 499 
Global warming, 3, 334 
Green products, 75 
Hurricane, 200, 223, 231 
relief efforts, 24 
Ice thickness, 61 
Lightning strikes, 231 
Nitrogen dioxide, 384 
Old Faithful, Yellowstone 
National Park, 44, 94, 279, 
486, 489, 491, 503, 504, 514 
Precipitation 
Baltimore, MD, 223 
Orlando, FL, 12 
San Francisco, CA, 343 
Tampa, FL, 222 
Rain, 645 
Saffir-Simpson Hurricane Scale, 
231 
Seawater, 312 
Snowfall, 636 
January average, 14 
Mount Shasta, CA, 224 
New York county, 275 
Nome, AK, 197 
Sunny and rainy days, 189, 193 
Seattle, WA, 140 
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Temperature, 63, 638 
Cleveland, OH, 47 
Denver, CO, 12 
Mohave, AZ, 29 
Pittsburgh, PA, 604 
San Diego, CA, 605 

Tornadoes, 125, 197 

UV Index, 62 

Water contamination, 350 

Water temperatures, 622 

Wet or dry, Seattle, WA, 140 

Wildland fires, 531 


Economics and Finance 


Account balance, 75 

Accounting, 199 

Allowance, 589 

ATM machine, 50, 52 

Audit, 133, 162, 335 

Bill payment, 332, 550 

Book spending, 49 

Charitable donations, 332, 499 

Children’s savings accounts, 296 

Commission, 113 

Credit card, 29, 112, 213, 273, 347, 
396, 432, 605 

Credit score, 475, 641 

Debit card, 31 

Debt and income, 628 

Depression, 14 

Dividends and earnings, 497, 498 

Dow Jones Industrial Average, 77 

Economic power, 8 

Emergency savings, 153 

Executive compensation, 417 

Financial advice, 213 

Financial debt, 605 

Financial shape, 177 

Forecasting earnings, 5 

Gross domestic product, 485, 488, 
493, 502, 504, 514, 516, 518, 
523 

Home owner income, 7 

Honeymoon financing, 213 

Income, 125, 496, 592, 649 

Investments, 63 

IRAs, 521, 522 

IRS tax filing wait times, 394 

Manufacturing, 63 

Missing tax deductions, 347, 348 

Money managing, 7 

Mortgages, 324 

Mutual funds, 306, 648 

Paycheck errors, 224 

Primary investor in household, 7 

Profit and loss analysis, 199 

Raising a child, cost, 380, 418 

Restaurant spending, 96, 437 

Retirement income, 213 

Salaries, 4, 6, 7, 29, 31, 48, 63, 64, 
72, 75, 80-83, 91, 97, 117, 119, 
124, 201, 276, 296, 349, 379, 
383, 394, 395, 412, 431, 439, 
440, 471, 508, 523, 524-526, 
535, 572, 582, 583, 586, 595, 
615, 616, 622, 623, 643 

Baltimore, MD, 593 
Boston, MA, 92 


Chicago, IL, 83, 92 
Dallas, TX, 92 
Jacksonville, FL, 593 
New York, NY, 92 
San Francisco, CA, 593 
Savings, 213, 548 
more money, 348 
Spending before traveling, 89 
Stock, 113, 143, 181, 228, 229, 315, 
519, 521, 570 
McDonald’s, 535 
Stock market, 144 
Tax preparation methods, 540, 
541, 543 
Taxes, 521, 522 
US. exports, 77 
US. income and economic 
research, 647 
Utility bills, 105, 254, 255 
Vacation cost, 7, 395, 418, 421, 433 


Education 


Achievement, 558, 593 

ACT, 8, 247, 253, 294, 437 

Ages of students, 68, 291, 309, 
313, 314, 480 

Alumni, annual contributions by, 
485, 489, 491, 503 

Books, 197, 312 

Business schools, 10 

Career counselors, 124 

Class size, 395, 618 

Classes, 181 

College costs, 548, 649 

College graduates, 288 

jobs, 17 

College president, 31 

College professors, 162 

College students, 161 

per faculty member, 115 

Continuing education, 560 

Day care, 439 

Degrees, 56, 400 

Degrees and gender, 185 

Doctorate degree, 627, 642, 643 

Dormitory room prices, 117 

Education, study plans, 468 

Educational attainment and 
work location, 553 

Elementary school students, 183, 
431 

Enrollment, 201, 227, 466, 
626-627 

Expenditure per student, 419 

Extracurricular activities, 199, 
358, 363, 364 

Faculty hours, 394, 395 

Final exam, 428 

Final grade, 75, 76, 525, 526 

Financial aid, 561 

Genders of students, 634 

GPA, 60, 74, 146, 324, 485, 494, 
529, 579, 586 

Health-related fields, study plans, 
468 

Highest level, 141 

Homework, 325 

Law school, 396, 642 

Mathematics assessment test, 444 

MCAT scores, 49, 73, 383 

Medical school, 149 

Midterm scores, 428, 525, 526 
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Musical training, 443 

Nursing major, 152, 156, 164 

Online classes, opinion, 30 

Physics minors, 29 

Plus/minus grading, 182 

Preschool, 439 

Public schools, 163, 535 

Quiz, 139, 199, 203 

Recess, 124 

Reliability of testing, 155 

SAT scores, 4, 52, 92, 104, 201, 
247, 252, 254, 278, 324, 421, 
434, 456, 481, 529, 591, 606, 
607 

Scholarship, 182 

Science assessment tests, 411, 
475, 572 

Secondary school teachers, 431 

Student advisory board, 172 

Student-athletes, 197 

Student ID numbers, 13, 137 

Student loans, 496, A29 

Student safety, 558 

Student sleep habits, 326 

Study habits, 30, 438, 497, 498, 
506 

Teaching experience, 301, 590 

Teaching methods, 449, 472 

Test grades/scores, 51, 60, 61, 63, 
69, 72, 75, 76, 107, 109, 110, 
116, 118, 125, 138, 238, 264, 
428, 497, 498, 506, 550, 638, 
649 

Test scores and GNI, 629 

Tuition, 73, 101, 102, 369 

US. history assessment tests, 411, 
572 

Vocabulary, 496 


Engineering 
Aerospace engineers, 349 
Bolts, 341, 420 
Brick mortar, 50 
Building heights 
Atlanta, GA, 506 
Houston, TX, 115 
Cooling capacity, 505 
Flow rate, 368 
Gears, 256 
Horsepower, 31 
Liquid dispenser, 256, 631 
Machine 
calibrations, 278 
part supplier, 140 
Nails, 256 
Nut, 438 
Petroleum engineering, 4 
Plastic injection mold, 592 
Plastic sheet cutting, 314 
Repairs, 176 
Resistors, 293 
Tensile strength, 448 
Washers, 255, 438 


Entertainment 


Academy Award, winning, 133 
Best-selling novel, 133 
Blu-ray™ players, 342, 644 
Broadway tickets, 14 
Concert, 634 

attendance, 197 

tickets, 73, 289 
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Game show, 138 
Games of chance, 143, 199, 200 
Home theater system, 312, 358, 
363, 572 
Horse race, 176 
Lottery, 139, 172, 175, 177, 179, 
212, 224 
Magazine, 8, 116, 184 
Monopoly game, 146 
Motion Picture Association, 
ratings, 12, 161 
Movie ticket prices, 117 
Movies, 25, 31, 150, 183-184, 296, 
558, 559 
budget and gross, 497 
on phone, 213 
MP3 player, 13, 275, 368 
Netbook, 185 
New Year’s Eve, 116 
News, 289 
Nielsen Company ratings, 15,25 
Oscar winners, ages, 106, 112 
Political blog, 49 
Powerball lottery, 186 
Radio stations, 118 
Raffle ticket, 135, 184, 196 
Reading, 207, 333 
Rock concert, fan age, 66 
Satellite television, 117 
Song lengths, 111 
Summer vacation, 151 
Television, 6, 10, 12, 108, 109, 118, 
199, 228, 437, 439, 532, 533 
3D TV, 282, 285 
HDTV, 282, 284 
late night, 582 
LCD TV, 342 
networks, Pittsburgh, PA, 10 
The Price Is Right, 126, 127 
top-ranked programs, 15 
Video games, 31, 57, 174, 209, 368 


Food and Nutrition 


Apple, 61, 264 

Beef, 627 

Caffeine, 95, 384, 494, 624 

Calories, 368, 507, 508, 572 

Candy, 601 

Carbohydrates, 316, 411, 573 

Carrots, 264 

Cereal, 247, 507, 537 

Cheese, 314 

Chicken, 627 

Chicken wings, 228 

Coffee, 77, 95, 159, 276, 320-321, 
383, 546 

Cookies, 213 

Corn, toxin, 173 

Dark chocolate, 413 

Delivery, 547 

Dried fruit, 417 

Energy bar, 417 

Fast food, 230, 384, 564 

Fat, 505, 508 

Fat substitute, 23 

Food away from home, money 
spent on, 419 

Fruit consumption, 295 

Hot chocolate, 508 

Hot dogs, 206, 507 

Ice cream, 264, 277, 333, 551, 552, 
554-555, 572 
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Jelly beans, 182 
Juice drinks, 312 
Leftovers, 331 
M&M’s, 229, 544-545 
Meat consumption, 295 
Melons, 421 
Menu, 139, 175 
Milk 
consumption, 246 
containers, 277 
processing, 407 
production, 121, 532, 533 
Multivitamin, 287 
Oranges, 264 
Peanuts, 255 
Pepper pungencies, 50 
Pizza, 176 
Potatoes, 73, 527 
Protein, 505 
Restaurant, 466, 557, 569 
Burger King, 573 
Long John Silver’s, 471 
McDonald’s, 573 
serving, 420 
Wendy’s, 471 
Rye, 527 
Salmonella, 359 
Saturated fat intake, 51 
Sodium, 316, 417, 471, 507 
Soft drinks, 255, 422 
Sports drink, 369, 407 
Storing fish, 4 
Sugar, 507, 531, 532 
Supermarket, 95, 250 
Tea drinker, 143 
Vegetables, 276, 421 
Vending machine, 264 
Water, 314, 383, 496 


Government 


Better Business Bureau, 57 
Congress, 167, 335 
gender profile, 14, 161 
issue when voting, 7 
Department of Energy, gas 
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Federal bailout, 417 
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energy, 349 
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Governor, Democrats, 8 
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29 
Legal system in U.S., 359 
Registered voters, 6, 35 
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Commission, 35 
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Allergy medicines, 23, 340 
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Appetite suppressant, 452 
Arthritis, 24, 469 


XVIII 


Assisted reproductive 
technology, 152, 232 
Asthma, 401 
Bacteria vaccine, 27 
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Calcium supplements, 473, 615, 
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Chronic medications, 481 
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466 
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Health care costs, 347, 348 
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Health care visits, 369, 541 
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Heart disease, 413 
Heart medication, 369 
Heart rate, 11, 75, 270, 322, 434, 
616 
Heart rhythm abnormality, 7 
Heart transplant, 263, 279, 450 
Herbal medicine, 460, 642 
HIV, 420 
Hospital, 52 
Hospital beds, 76 
Hospital costs, 412 
Hospital length of stay, 77, 325, 
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Physician’s intake form, 14 
Physicians, leaving medicine, 29 
Placebo, 557, 559, 562 
Plantar heel pain, 465 
Plaque buildup in arteries, 458 
Pregnancy study, Cebu, 
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457, 459, 473, 496 


Housing and 
Construction 


City house value, 296 

Construction, 322 

Home insurance, 623 

House size, 358, 363, 365, 549 
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136, 138, 139, 140, 143, 146, 
147, 150, 156-158, 162, 166, 
181, 183, 185, 197 

Digital camera, 342, 348 
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DVDs, 170, 275 
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Electricity consumption, 384 

Electricity cost, 592 
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Energy cost, 584, 586 

Energy efficiency, 505 
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Farm values, 93, 94, 296, 504 
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Floral arrangement, 301 

Fluorescent lamps, 384 

Full-body scanner, 413 

Furnaces, 358, 363 

Furniture store, 368 

Garden hose, 368 
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Gas station, 118, 227 
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Lawn mowers, 369 
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Light bulbs, 325, 368, 384 

Liquid volume of cans, 115 

Living on your own survey, 73 

Marbles, 203 

Memory, 8 

Metacarpal bone length, 649 

Metal detector, 227 

Microwave, 421 

Middle initial, 138 

Mozart, 187 

Museum attendance, 601 

Music downloads, 355 

Nail polish, 333 

NASA budget, 62 

Natural gas, 2, 648 

Nitrogen oxides emission, 536 

Nuclear power plants, 100, 102, 

103 

Number generator, 638 

Oil, 29, 64, 520, 521 

Opinion poll, 13 

Pages, section, 228 


Paint, 368, 644 
cans, 277, 314 
sprayer, 315 
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Pilot’s test, 222 
Power failure, 73 
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30 
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Questionnaire, 7 
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Recycling, 332 
Refrigerator, 312, 313, 369 
Salesperson, 13 
Smartphones, 35 
Social Security numbers, 29 
Socks, 182 
Space shuttle 
flights, 118 
fuel, 231 
menu, 175 
speed, 191 
Speed of sound, 497 
Spinner, 137, 140 
Spray-on water repellent, 611 
Spring break, 7, 30 
Sprinkler system, 383 
State troopers, 51 
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Airplane accidents, 217 
Alcohol-related accidents, 561 
Emergency response time, 49, 
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Motor vehicle 
casualties, 197, 562, 590 
crashes, 63 
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Airplanes, 17, 74, 109, 118 
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ATY, 368 
Auto parts, 224, 301 
Automobile insurance, 390, 614 
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battery, 301, 341, 358, 363, 581 
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Blood alcohol content, drivers, 30 
Brakes, 278 
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Bumper, 447 
Bus, 644 
Car accident, 146, 193, 496 
Car occupancy, 163, 200, 220 
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Carry-on luggage, 74, 222 
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Crash test, 447, 539 
Department of Motor Vehicles 
wait times, 392, 467 
Diesel engines, 230 
Drivers, 61, 330 
Driver’s license exam, 182 
Driving time, 271-272 
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Flights, 155 
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Fuel additive, 617 
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Motorcycles, 343 
fuel economy, 118, 466 
helmet usage, 23, 466, 467 
New highway, 171 
Oil change, 346, 358, 363, 395 
Oil tankers, 223 
Parking ticket, 150 
Pickup trucks, 151 
Pit stop, 374 
Power boats, 434 
Price of a car, 9, 390, 533 
Public transportation, 289 
Speed of vehicles, 60, 73, 105, 
112, 249, 374 
Sports cars, 73, 325, 347 
Taxi cab, 369 
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Traffic congestion, 334 
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President’s approval ratings, 17 
Supreme Court justice, ages, 117 
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Curling, 402 
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Fishing, 207, 227, 409 
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National Football League, 73, 
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Super Bowl, 103, 636 
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TO 
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1.1 An Overview of 
Statistics 


1.2 Data Classification Bt 


@ CASE STUDY 
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@ ACTIVITY 
@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


m@ HISTORY OF STATISTICS— 
TIMELINE 


@ TECHNOLOGY 


In 2008, the population of New 
Orleans, Louisiana grew faster than 
any other large city in the United 
States. Despite the increase, the 
population of 311,853 was still well 
below the pre-Hurricane Katrina 
population of 484,674. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


«€ WHERE YOU'VE BEEN 


You are already familiar with many of the 
practices of statistics, such as taking surveys, 
collecting data, and describing populations. What 
you may not know is that collecting accurate 
statistical data is often difficult and costly. 
Consider, for instance, the monumental task of 
counting and describing the entire population of 


WHERE YOU'RE 


the United States. If you were in charge of such 
a census, how would you do it? How would you 
ensure that your results are accurate? These and 
many more concerns are the responsibility of the 
United States Census Bureau, which conducts 
the census every decade. 


In Chapter 1, you will be introduced to the basic 
concepts and goals of statistics. For instance, 
statistics were used to construct the following 
graphs, which show the fastest growing U.S. cities 
(population over 100,000) in 2008 by percent 
increase in population, U.S. cities with the largest 
numerical increases in population, and the 


regions where the cities are located. 


For the 2010 Census, the Census Bureau sent 
short forms to every household. Short forms ask 
all members of every household such things as 


Fastest Growing U.S. Cities 
(Population over 100,000) 
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their gender, age, race, and ethnicity. Previously, 
a long form, which covered additional topics, was 
sent to about 17% of the population. But for the 
first time since 1940, the long form is being 
replaced by the American Community Survey, 
which will survey about 3 million households a 
year throughout the decade. These 3 million 
households will form a sample. In this course, 
you will learn how the data collected from a 
sample are used to infer characteristics about the 
entire population. 


Location of the 25 Fastest 
Growing U.S. Cities 


South 
68% 


Location of the 25 U.S. Cities with 
Largest Numerical Increases 


Northeast 
4% 
Midwest 
8% 
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2 CHAPTER 1 


> The definition of statistics 


>» How to distinguish between a 
population and a sample and 
between a parameter and a 
statistic 


>» How to distinguish between 
descriptive statistics and 
inferential statistics 
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INTRODUCTION TO STATISTICS 


An Overview of Statistics 


WHAT YOU SHOULD LEARN 


A Definition of Statistics » Data Sets » Branches of Statistics 


> A DEFINITION OF STATISTICS 


As you begin this course, you may wonder: What is statistics? Why should I study 
statistics? How can studying statistics help me in my profession? Almost every day 
you are exposed to statistics. For instance, consider the following. 


e “The number of Americans with diabetes will nearly double in the next 
25 years.” (Source: Diabetes Care) 


e “The NRF expects holiday sales to decline 1% versus a 3.4% drop in holiday 
sales the previous year.” (Source: National Retail Federation) 


e “EIA projects total U.S. natural gas consumption will decline by 2.6 percent 
in 2009 and increase by 0.5 percent in 2010.” (Source: Energy Information 
Administration) 


The three statements you just read are based on the collection of data. 


DEFINITION 


Data consist of information coming from observations, counts, measurements, 
or responses. 


Sometimes data are presented graphically. If you have ever read 
USA TODAY, you have certainly seen one of that newspaper’s most popular 
features, USA TODAY Snapshots. Graphics such as this present information in a 
way that is easy to understand. 


Job seekers need a keen eye 
How many typos in a résumé does it take for 
you to decide not to consider a job candidate 
for a position with your company? 


Two 36% 


Three 
Don't 14% 
know 3% 


Four or more 7% Source: Accountemps 


The use of statistics dates back to census taking in ancient Babylonia, Egypt, 
and later in the Roman Empire, when data were collected about matters 
concerning the state, such as births and deaths. In fact, the word statistics is derived 
from the Latin word status, meaning “state.” So, what is statistics? 


DEFINITION 


Statistics is the science of collecting, organizing, analyzing, and interpreting 
data in order to make decisions. 
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INSIGHT 


A census consists of data 


from an entire population. 


But, unless a population 
is small, it is usually 
impractical to obtain 

all the population 

data. In most studies, 
information must be 
obtained from a sample. 
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SECTION 1.1 AN OVERVIEW OF STATISTICS 3 


>» DATA SETS 


There are two types of data sets you will use when studying statistics. These data 
sets are called populations and samples. 


DEFINITION 


A population is the collection of all outcomes, responses, measurements, or 
counts that are of interest. 


A sample is a subset, or part, of a population. 


A sample should be representative of a population so that sample data 
can be used to form conclusions about that population. Sample data must be 
collected using an appropriate method, such as random sampling. (You will learn 
more about random sampling in Section 1.3.) If they are not collected using an 
appropriate method, the data are of no value. 


EXAMPLE 1 


> Identifying Data Sets 


In a recent survey, 1500 adults in the United States were asked if they thought 
there was solid evidence of global warming. Eight hundred fifty-five of the 
adults said yes. Identify the population and the sample. Describe the sample 
data set. (Adapted from Pew Research Center) 


> Solution 


The population consists of the responses of all adults in the United States, and 
the sample consists of the responses of the 1500 adults in the United States in 
the survey. The sample is a subset of the responses of all adults in the United 
States. The sample data set consists of 855 yes’s and 645 no’s. 


Responses of all adults in the 
United States (population) 


Responses of adults 


in survey (sample) 


> Try It Yourself 1 


The U.S. Department of Energy conducts weekly surveys of approximately 
900 gasoline stations to determine the average price per gallon of regular 
gasoline. On January 11, 2010, the average price was $2.75 per gallon. Identify 
the population and the sample. Describe the sample data set. (Source: Energy 
Information Administration) 


a. Identify the population and the sample. 
b. What does the sample data set consist of? Answer: Page A30 


Whether a data set is a population or a sample usually depends on the context 
of the real-life situation. For instance, in Example 1, the population was the set of 
responses of all adults in the United States. Depending on the purpose of the 
survey, the population could have been the set of responses of all adults who live 
in California or who have cellular phones or who read a particular magazine. 
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4 CHAPTER 1 


STUDY TIP 


The terms parameter and statistic 
are easy to remember if you use 
the mnemonic device of 
matching the first 
letters in population 
parameter and the 
first letters in sample 
Statistic. 


How accurate is the U.S. census? 
According to a post-census 
evaluation conducted by the 
Census Bureau, the 1990 census 
undercounted the U.S. popula- 
tion by an estimated 4.0 million 
people. The 1990 census was 
the first census since at least 
1940 to be less accurate than 

its predecessor. Notice that the 
undercount for the 2000 census 
was —1.3 million people. This 
means that the 2000 census 
overcounted the U.S. population 
by 1.3 million people. 


U.S. Census Undercount 


> 


8- 


Population undercount 


1940 1960 1980 2000 
Year 


What are some difficulties in 
collecting population data? 
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INTRODUCTION TO STATISTICS 


Two important terms that are used throughout this course are parameter and 
Statistic. 


DEFINITION 


A parameter is a numerical description of a population characteristic. 


A statistic is a numerical description of a sample characteristic. 


It is important to note that a sample statistic can differ from sample to 
sample whereas a population parameter is constant for a population. 


EXAMPLE 2 


> Distinguishing Between a Parameter and a Statistic 


Decide whether the numerical value describes a population parameter or a 
sample statistic. Explain your reasoning. 


1. A recent survey of 200 college career centers reported that the average 
starting salary for petroleum engineering majors is $83,121. (Source: National 
Association of Colleges and Employers) 


2. The 2182 students who accepted admission offers to Northwestern 
University in 2009 have an average SAT score of 1442. (Source: Northwestern 
University) 


3. In a random check of a sample of retail stores, the Food and Drug 
Administration found that 34% of the stores were not storing fish at the 
proper temperature. 


> Solution 


1. Because the average of $83,121 is based on a subset of the population, it is 
a sample statistic. 


2. Because the SAT score of 1442 is based on all the students who accepted 
admission offers in 2009, it is a population parameter. 


3. Because the percent of 34% is based on a subset of the population, it is a 
sample statistic. 


> Try It Yourself 2 


In 2009, Major League Baseball teams spent a total of $2,655,395,194 on 
players’ salaries. Does this numerical value describe a population parameter or 
a sample statistic? (Source: USA Today) 


a. Decide whether the numerical value is from a population or a sample. 
b. Specify whether the numerical value is a parameter or a statistic. 
Answer: Page A30 


In this course, you will see how the use of statistics can help you make 
informed decisions that affect your life. Consider the census that the US. 
government takes every decade. When taking the census, the Census Bureau 
attempts to contact everyone living in the United States. Although it is impossible 
to count everyone, it is important that the census be as accurate as it can be, 
because public officials make many decisions based on the census information. 
Data collected in the 2010 census will determine how to assign congressional 
seats and how to distribute public funds. 
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SECTION 1.1 AN OVERVIEW OF STATISTICS 5 


>» BRANCHES OF STATISTICS 


The study of statistics has two major branches: descriptive statistics and 
inferential statistics. 


DEFINITION 


Descriptive statistics is the branch of statistics that involves the organization, 
summarization, and display of data. 


Inferential statistics is the branch of statistics that involves using a sample to 
draw conclusions about a population. A basic tool in the study of inferential 
statistics is probability. 


EXAMPLE 3 


> Descriptive and Inferential Statistics 
Decide which part of the study represents the descriptive branch of statistics. 
What conclusions might be drawn from the study using inferential statistics? 


1. A large sample of men, aged 48, was [ERijaninas 
studied for 18 years. For unmarried 


men, approximately 70% were alive at F B11 

age 65. For married men, 90% were 4 : 

alive at age 65. (Source: The Journal of , 

Family Issues) Married Men | YU, 
| 


2. In a sample of Wall Street analysts, the percentage who incorrectly forecasted 
high-tech earnings in a recent year was 44%. (Source: Bloomberg News) 


> Solution 


1. Descriptive statistics involves statements such as “For unmarried men, 
approximately 70% were alive at age 65” and “For married men, 90% were 
alive at 65.” A possible inference drawn from the study is that being 
married is associated with a longer life for men. 


2. The part of this study that represents the descriptive branch of statistics 
involves the statement “the percentage [of Wall Street analysts] who 
incorrectly forecasted high-tech earnings in a recent year was 44%.” A 
possible inference drawn from the study is that the stock market is difficult 
to forecast, even for professionals. 


> Try It Yourself 3 


A survey conducted among 1017 men and women by Opinion Research 
Corporation International found that 76% of women and 60% of men had a 
physical examination within the previous year. (Source: Men’s Health) 


a. Identify the descriptive aspect of the survey. 
b. What inferences could be drawn from this survey? Answer: Page A30 


Throughout this course you will see applications of both branches. A major 
theme in this course will be how to use sample statistics to make inferences about 
unknown population parameters. 
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INTRODUCTION TO STATISTICS 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. How is a sample related to a population? 


2. Why is a sample used more often than a population? 
3. What is the difference between a parameter and a statistic? 


4. What are the two main branches of statistics? 


True or False? Jn Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. A statistic is a measure that describes a population characteristic. 
6. A sample is a subset of a population. 


7. It is impossible for the Census Bureau to obtain all the census data about the 
population of the United States. 


8. Inferential statistics involves using a population to draw a conclusion about 
a corresponding sample. 


9. A population is the collection of some outcomes, responses, measurements, 
or counts that are of interest. 


10. A sample statistic will not change from sample to sample. 


Classifying a Data Set Jn Exercises 11-20, determine whether the data set is 
a population or a sample. Explain your reasoning. 


11. The height of each player on a school’s basketball team 

12. The amount of energy collected from every wind turbine on a wind farm 
13. A survey of 500 spectators from a stadium with 42,000 spectators 

14. The annual salary of each pharmacist at a pharmacy 

15. The cholesterol levels of 20 patients in a hospital with 100 patients 

16. The number of televisions in each U.S. household 

17. The final score of each golfer in a tournament 

18. The age of every third person entering a clothing store 

19. The political party of every U.S. president 


20. The soil contamination levels at 10 locations near a landfill 


Graphical Analysis In Exercises 21-24, use the Venn diagram to identify the 
population and the sample. 


21. [Parties of registered voters in 
Warren County 


Parties of Warren 
County voters who 
respond to 
online survey 


22. Number of students who 
donate at a blood drive 


Number of 
students who 
donate that 
have type O* 
blood 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


23. 


SECTION 1.1 AN OVERVIEW OF STATISTICS 7 
Ages of adults in the United 24. Incomes of home 
States who own cellular phones owners in Texas 


Ages of adults 
in the U.S. who 
own Samsung 
cellular phones 


Incomes of home 
owners in Texas 
with mortgages 


HM USING AND INTERPRETING CONCEPTS 


Identifying Populations and Samples In Exercises 25-34, identify the 
population and the sample. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


A survey of 1000 U.S. adults found that 59% think buying a home is the best 
investment a family can make. (Source: Rasmussen Reports) 


A study of 33,043 infants in Italy was conducted to find a link between a 
heart rhythm abnormality and sudden infant death syndrome. (Source: New 
England Journal of Medicine) 


A survey of 1442 US. adults found that 36% received an influenza vaccine 
for the current flu season. (Source: Zogby International) 


A survey of 1600 people found that 76% plan on using the Microsoft 
Windows 7™ operating system at their businesses. (Source: Information 
Technology Intelligence Corporation and Sunbelt Software) 


A survey of 800 registered voters found that 50% think economic stimulus is 
the most important issue to consider when voting for Congress. (Source: 
Diageo/Hotline Poll) 


A survey of 496 students at a college found that 10% planned on traveling 
out of the country during spring break. 


A survey of 546 U.S. women found that more than 56% are the primary 
investors in their households. (Adapted from Roper Starch Worldwide for Intuit) 


A survey of 791 vacationers from the United States found that they planned 
on spending at least $2000 for their next vacation. 


A magazine mails questionnaires to each company in Fortune magazine’s 
top 100 best companies to work for and receives responses from 85 of them. 


At the end of the day, a quality control inspector selects 20 light bulbs from 
the day’s production and tests them. 


Distinguishing Between a Parameter and a Statistic Jn Exercises 
35-42, determine whether the numerical value is a parameter or a statistic. Explain 
your reasoning. 


35. 
36. 


The average annual salary for 35 of a company’s 1200 accountants is $68,000. 


In a survey of a sample of high school students, 43% said that their mothers 
had taught them the most about managing money. (Source: Harris Poll for 
Girls Incorporated) 
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37. Sixty-two of the 97 passengers aboard the Hindenburg airship survived its 
explosion. 


38. In January 2010, 52% of the governors of the 50 states in the United States 
were Democrats. 


39. In a survey of 300 computer users, 8% said their computers had 
malfunctions that needed to be repaired by service technicians. 


40. In a recent year, the interest category for 12% of all new magazines was 
sports. (Source: Oxbridge Communications) 


41. In a recent survey of 2000 people, 44% said China is the world’s leading 
economic power. (Source: Pew Research Center) 


42. In a recent year, the average math scores for all graduates on the ACT 
was 21.0. (Source: ACT, Inc.) 


43. Which part of the survey described in Exercise 31 represents the descriptive 
branch of statistics? Make an inference based on the results of the survey. 


44. Which part of the survey described in Exercise 32 represents the descriptive 
branch of statistics? Make an inference based on the results of the survey. 


M@ EXTENDING CONCEPTS 


45. Identifying Data Sets in Articles Find a newspaper or magazine article that 

describes a survey. 

(a) Identify the sample used in the survey. 

(b) What is the sample’s population? 

(c) Make an inference based on the results of the survey. 

46. Sleep Deprivation In a recent study, volunteers who had 8 hours of sleep 
were three times more likely to answer questions correctly on a math test 
than were sleep-deprived participants. (Source: CBS News) 

(a) Identify the sample used in the study. 

(b) What is the sample’s population? 

(c) Which part of the study represents the descriptive branch of statistics? 
(d) Make an inference based on the results of the study. 

47. Living in Florida A study shows that senior citizens who live in Florida 
have better memories than senior citizens who do not live in Florida. 
(a) Make an inference based on the results of this study. 

(b) What is wrong with this type of reasoning? 

48. Increase in Obesity Rates A study shows that the obesity rate among boys 
ages 2 to 19 has increased over the past several years. (Source: Washington Post) 
(a) Make an inference based on the results of this study. 

(b) What is wrong with this type of reasoning? 

49. Writing Write an essay about the importance of statistics for one of the 
following. 

° A study on the effectiveness of a new drug 
¢ An analysis of a manufacturing process 


¢ Making conclusions about voter opinions using surveys 
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SECTION 1.2 DATA CLASSIFICATION 9 


Data Classification 


WHAT YOU SHOULD LEARN Types of Data > Levels of Measurement 


» How to distinguish between » TYPES OF DATA 


qualitative data and When doing a study, it is important to know the kind of data involved. The nature 
quantitative data of the data you are working with will determine which statistical procedures can 
be used. In this section, you will learn how to classify data by type and by level of 
measurement. Data sets can consist of two types of data: qualitative data and 
quantitative data. 


DEFINITION 


Qualitative data consist of attributes, labels, or nonnumerical entries. 


> How to classify data with 
respect to the four levels 
of measurement: nominal, 
ordinal, interval, and ratio 


Quantitative data consist of numerical measurements or counts. 


EXAMPLE 1 


> Classifying Data by Type 

The suggested retail prices of several Ford vehicles are shown in the table. 
Which data are qualitative data and which are quantitative data? Explain your 
reasoning. (Source: Ford Motor Company) 


Focus Sedan $15,995 
Fusion $19,270 
Mustang $20,995 
Edge $26,920 
Flex $28,495 
Escape Hybrid $32,260 
Expedition $35,085 
F-450 $44,145 


> Solution 


The information shown in the table can be separated into two data sets. 
One data set contains the names of vehicle models, and the other contains the 
suggested retail prices of vehicle models. The names are nonnumerical entries, 


so these are qualitative data. The suggested retail prices are numerical entries, 
City Population so these are quantitative data. 


Balti MD 
: mee: a ieee > Try It Yourself 1 
acKSOnvulle, 4 é Pre : 2 
easohic tN ee The populations of several U.S. cities are shown in the table. Which data are 
SMP AMS, ? qualitative data and which are quantitative data? (Source: U.S. Census Bureau) 
Pasadena, CA 143,080 : 
Seu Autonet 1,351,305 a. Identify the two data sets. 
b. Decide whether each data set consists of numerical or nonnumerical entries. 
Seattle, WA 598,541 


c. Specify the qualitative data and the quantitative data. Answer: Page A30 
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In 2009, Forbes Magazine chose 
the 75 best business schools in 
the United States. Forbes based 
their rankings on the return on 
investment achieved by the 


graduates from the class of 2004. 


Graduates of the top five M.B.A. 
programs typically earn more 
than $200,000 within five years. 


(Source: Forbes) 


. Stanford 
. Dartmouth 


. Harvard 
. Chicago 


. Pennsylvania 


In this list, what is the level of 
measurement? 


>» LEVELS OF MEASUREMENT 


Another characteristic of data is its level of measurement. The level of 
measurement determines which statistical calculations are meaningful. The four 
levels of measurement, in order from lowest to highest, are nominal, ordinal, 
interval, and ratio. 


DEFINITION 


Data at the nominal level of measurement are qualitative only. Data at this 
level are categorized using names, labels, or qualities. No mathematical 
computations can be made at this level. 


Data at the ordinal level of measurement are qualitative or quantitative. Data 
at this level can be arranged in order, or ranked, but differences between data 
entries are not meaningful. 


When numbers are at the nominal level of measurement, they simply 
represent a label. Examples of numbers used as labels include Social Security 
numbers and numbers on sports jerseys. For instance, it would not make sense 
to add the numbers on the players’ jerseys for the Chicago Bears. 


EXAMPLE 2 


> Classifying Data by Level 


Two data sets are shown. Which data set consists of data at the nominal level? 
Which data set consists of data at the ordinal level? Explain your reasoning. 
(Source: The Nielsen Company) 


1. American Idol-Wednesday WTAE (ABC) 
2. American Idol—-Tuesday WPXI (NBC) 
3. Dancing with the Stars KDKA (CBS) 
4. NCIS WPGH (FOX) 


5. The Mentalist 


> Solution 

The first data set lists the ranks of five TV programs. The data set consists of 
the ranks 1, 2,3, 4, and 5. Because the ranks can be listed in order, these data 
are at the ordinal level. Note that the difference between a rank of 1 and 5 has 
no mathematical meaning. The second data set consists of the call letters of 
each network affiliate in Pittsburgh. The call letters are simply the names of 
network affiliates, so these data are at the nominal level. 


> Try It Yourself 2 


Consider the following data sets. For each data set, decide whether the data are 
at the nominal level or at the ordinal level. 


1. The final standings for the Pacific Division of the National Basketball 
Association 


2. A collection of phone numbers 


a. Identify what each data set represents. 
b. Specify the level of measurement and justify your answer. 
Answer: Page A30 
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1923, 1927, 
1937, 1938, 
1947, 1949, 
1953, 1956, 
1977, 1978, 
2000, 2009 


Baltimore 


Boston 
Chicago 


Cleveland 


Detroit 


1928, 1932, 
1939, 1941, 
1950, 1951, 
1958, 1961, 
1996, 1998, 


Kansas City 


Los Angeles 


Minnesota 


New York 


Oakland 
Seattle 


Tampa Bay 


Texas 


Toronto 


1936, 
1943, 
1952, 
1962, 
1999, 


160 
212 
184 
161 
183 
144 
173 
172 
244 
135 
160 
199 
224 
209 
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The two highest levels of measurement consist of quantitative data only. 


DEFINITION 


Data at the interval level of measurement can be ordered, and meaningful 
differences between data entries can be calculated. At the interval level, a 
zero entry simply represents a position on a scale; the entry is not an inherent 
Zero. 


Data at the ratio level of measurement are similar to data at the interval level, 
with the added property that a zero entry is an inherent zero. A ratio of two 
data values can be formed so that one data value can be meaningfully 
expressed as a multiple of another. 


An inherent zero is a zero that implies “none.” For instance, the amount 
of money you have in a savings account could be zero dollars. In this case, 
the zero represents no money; it is an inherent zero. On the other hand, a 
temperature of 0°C does not represent a condition in which no heat is present. 
The 0°C temperature is simply a position on the Celsius scale; it is not an 
inherent zero. 

To distinguish between data at the interval level and at the ratio level, 
determine whether the expression “twice as much” has any meaning in the 
context of the data. For instance, $2 is twice as much as $1, so these data are at 
the ratio level. On the other hand, 2°C is not twice as warm as 1°C, so these data 
are at the interval level. 


EXAMPLE 3 


> Classifying Data by Level 


Two data sets are shown at the left. Which data set consists of data at the 
interval level? Which data set consists of data at the ratio level? Explain your 
reasoning. (Source: Major League Baseball) 


> Solution 


Both of these data sets contain quantitative data. Consider the dates of the 
Yankees’ World Series victories. It makes sense to find differences between 
specific dates. For instance, the time between the Yankees’ first and last World 
Series victories is 


2009 — 1923 = 86 years. 


But it does not make sense to say that one year is a multiple of another. So, 
these data are at the interval level. However, using the home run totals, you 
can find differences and write ratios. From the data, you can see that Texas hit 
63 more home runs than Cleveland hit and that New York hit about 1.5 times 
as many home runs as Seattle hit. So, these data are at the ratio level. 


> Try It Yourself 3 
Decide whether the data are at the interval level or at the ratio level. 


1. The body temperatures (in degrees Fahrenheit) of an athlete during an 
exercise session 


2. The heart rates (in beats per minute) of an athlete during an exercise session 


a. Identify what each data set represents. 
b. Specify the level of measurement and justify your answer. 
Answer: Page A30 
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The following tables summarize which operations are meaningful at each 
of the four levels of measurement. When identifying a data set’s level of 
measurement, use the highest level that applies. 


Nominal Yes No No No 
Ordinal Yes Yes No No 
Interval Yes Yes Yes No 
Ratio Yes Yes Yes Wes 


Summary of Four Levels of Measurement 


Types of Shows Televised by a Network Put in a category. 
Comedy Documentaries For instance, a show televised by 
Drama Cooking the network could be put into 
Reality Shows Soap Operas one of the eight categories shown. 
Sports Talk Shows 
Motion Picture Association of America Ratings Put in a category and put in order. 
Description For instance, a PG rating has 
G General Audiences a stronger restriction than a 
PG Parental Guidance Suggested G rating. 


PG-13 Parents Strongly Cautioned 
R Restricted 
NC-17 No One Under 17 Admitted 


Average Monthly Temperatures (in degrees Put in a category, put in order, and 
Fahrenheit) for Denver, CO find differences between values. 
Jan 29.2 Jul 73.4 For instance, 57.2 — 47.6 = 9.6°F. 
Feb 33.2 Aug 71.7 So, May is 9.6° warmer than April. 
Mar 39.6 Sep 62.4 
Apr 47.6 Oct 51.0 
May 57.2 Nov 37.5 
Jun 67.6 Dee 30.3 
Average Monthly Precipitation (in inches) Put in a category, put in order, find 
for Orlando, FL differences between values, and find 
Tan Tull 72 ratios of values. 
Feb 2.4 Aug 6.3 For instance, is = 2. So, there 
Manes 5 Sen 15.8 is twice as much rain in June as 
: ace in May. 
Apr 2.4 Oct] 
May 3.7 INK 2.3} 
Jun 7.4 IDE 23} 
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SECTION 1.2 DATA CLASSIFICATION 13 


M@ BUILDING BASIC SKILLS AND VOCABULARY 
1. 
az: 


Name each level of measurement for which data can be qualitative. 


Name each level of measurement for which data can be quantitative. 


i A True or False? Jn Exercises 3-6, determine whether the statement is true or 


false. If it is false, rewrite it as a true statement. 
3. 
4. 


Data at the ordinal level are quantitative only. 


For data at the interval level, you cannot calculate meaningful differences 
between data entries. 


. More types of calculations can be performed with data at the nominal level 


than with data at the interval level. 


. Data at the ratio level cannot be put in order. 


Mi USING AND INTERPRETING CONCEPTS 


Classifying Data by Type Jn Exercises 7-18, determine whether the data are 
qualitative or quantitative. Explain your reasoning. 


7. telephone numbers in a directory 8. heights of hot air balloons 

9. body temperatures of patients 10. eye colors of models 
11. lengths of songs on MP3 player 12. carrying capacities of pickups 
13. player numbers for a soccer team 14. student ID numbers 
15. weights of infants at a hospital 16. species of trees in a forest 
17. responses on an opinion poll 18. wait times at a grocery store 


Classifying Data by Level In Exercises 19-24, determine whether the data 
are qualitative or quantitative, and identify the data set’s level of measurement. 
Explain your reasoning. 


19. 


20. 


21. 


22. 


Football The top five teams in the final college football poll released in 
January 2010 are listed. (Source: Associated Press) 


1.Alabama 2.Texas 3.Florida 4.Boise State 5. Ohio State 
Politics The three political parties in the 111th Congress are listed below. 
Republican Democrat Independent 


Top Salespeople The regions representing the top salespeople in a 
corporation for the past six years are given. 


Southeast Northwest Northeast 
Southeast Southwest Southwest 


Fish Lengths The lengths (in inches) of a sample of striped bass caught in 
Maryland waters are listed. (Adapted from National Marine Fisheries Service, 
Fisheries Statistics and Economics Division) 


16 17.25 19 18.75 21 20.3 19.8 24 21.82 
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23. 


24. 


Best Seller List The top five hardcover nonfiction books on The New York 
Times Best Seller List on January 19, 2010 are shown. (Source: The New York 
Times) 


1. Committed 2. Have a Little Faith 3.The Checklist Manifesto 
4.Going Rogue _ 5. Stones Into Schools 


Ticket Prices The average ticket prices for 10 Broadway shows in 2009 are 
listed. (Adapted from The Broadway League) 


$149 $128 $124 $91 $96 $106 $112 $95 $86 $74 


Graphical Analysis In Exercises 25-28, identify the level of measurement of 
the data listed on the horizontal axis in the graph. 


25. 


27. 


29. 


30. 


31. 


32. 


Over the Next Few Years, How 26. Average January Snowfall 
Likely Is It That the United States for 15 Cities 
Will Enter a 1930s-Like Depression? A 
A 5- 
40+ 
35 -P 2 44 
2 30> 5 : 
oO 25 a rt ‘a 4 
9.2977 3 
= 0 5 
a 15> 2 44 
10+ E 
T LI : 
> 1 
pegs eree p [1 [1 
S232 8232 3 = 
“e737 37 8 1 3 5 7 9 
g = &@ Snowfall (in inches) 
Response 
(Source: Rasmussen Reports) (Source: National Climatic Data Center) 
Gender Profile of the 28. Motor Vehicle Accidents 
111th Congress by Year 
A A 
500 + E 12.0-- 
400 + eae 
3 2 0+ 
= 300+ s 
5 S 10.54 
vA MI bn 
ad 2 10.0-+ 
100 Z ost 
ieee : : 
Women Men 2003 2004 2005 2006 2007 
Gender Year 


(Source: Congressional Research Service) (Source: National Safety Council) 
The following items appear on a physician’s intake form. Identify the level of 
measurement of the data. 
a. Temperature b. Allergies 
c. Weight d. Pain level (scale of 0 to 10) 
The following items appear on an employment application. Identify the level 
of measurement of the data. 
a. Highest grade level completed b. Gender 
c. Year of college graduation d. Number of years at last job 


EXTENDING CONCEPTS 


Writing What is an inherent zero? Describe three examples of data sets 
that have inherent zeros and three that do not. 


Writing Describe two examples of data sets for each of the four levels of 
measurement. Justify your answer. 
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Rating Television Shows 
United States 


The Nielsen Company has been rating television programs 
for more than 60 years. Nielsen uses several sampling 
procedures, but its main one is to track the viewing 
patterns of 20,000 households. These contain more than 
45,000 people and are chosen to form a cross section of 
the overall population. The households represent various 
locations, ethnic groups, and income brackets. The data 
gathered from the Nielsen sample of 20,000 households 
are used to draw inferences about the population of all 
households in the United States. 


in the 


TV programs viewed by all households 
in the United States (114.5 million households) 


TV programs viewed 


by Nielsen sample 
(20,000 households) 


Top-Ranked Programs in Overall Viewing for the Week of 11/23/09-11/29/09 


Rank 
Rank Last Week Program Name Network Day, Time Rating Share Audience 
1 2 Dancing with the Stars ABC Mon., 8:00 P.M. 12.9 19 20,411,000 
2 1 NCIS CBS Tues., 8:00 P.M. 123 20 20,348,000 
3 4 Dancing with the Stars Results ABC Tues., 9:00 P.M. 12.0 20 19,294,000 
4 3 NBC Sunday Night Football NBC Sun., 8:15 PM. 15 18 19,210,000 
5 8 NCIS: Los Angeles CBS Tues., 9:00 P.M. 10.4 16 17,221,000 
6 5 60 Minutes CBS Sun., 7:00 P.M. 9.0 14 14,377,000 
7 15 The Big Bang Theory CBS Mon., 9:30 P.M. 8.4 13 14,129,000 
8 16 Sunday Night NFL Pre-Kick NBC Sun., 8:00 P.M. 8.4 13 13,927,000 
9 12 Two and a Half Men CBS Mon., 9:00 P.M. 8.3 12 13,877,000 
10 11 Criminal Minds CBS Wed., 9:00 P.M. 8.2 14 13,605,000 


Copyrighted information of The Nielsen Company, licensed for use herein. 


M EXERCISES 


1. Rating Points Each rating point represents 5. 
1,145,000 households, or 1% of the households in 
the United States. Does a program with a rating 
of 8.4 have twice the number of households as 6 


a program with a rating of 4.2? Explain your 
reasoning. 


2. Sampling Percent What percentage of the 7 
total number of U.S. households is used in the 
Nielsen sample? 

3. Nominal Level of Measurement Which 
columns in the table contain data at the nominal 
level? 

4. Ordinal Level of Measurement Which 8. 


columns in the table contain data at the ordinal 
level? Describe two ways that the data can be 
ordered. 


. Ratio Level of Measurement 


Interval Level of Measurement Which 
column in the table contains data at the interval 
level? How can these data be ordered? 


Which columns 
contain data at the ratio level? 


Rankings The column listed as “Share” gives 
the percentage of televisions in use at a given 
time. The 11th ranked program for this week is 
CSI: Miami with a rating of 8.4 and share of 14. 
Using this information, how does Nielsen rank 
the programs? Why do you think they do it this 
way? Explain your reasoning. 


Inferences What decisions (inferences) can be 
made on the basis of the Nielsen ratings? 


Presented by: https://jafrilibrary.org 


16 CHAPTER 1 INTRODUCTION TO STATISTICS 


WHAT YOU SHOULD LEARN 


» How to design a statistical 
study 


Vv 


How to collect data by 

doing an observational study, 
performing an experiment, 
using a simulation, or using 
a survey 


vv 


How to design an experiment 


vw 


How to create a sample using 
random sampling, simple 
random sampling, stratified 
sampling, cluster sampling, 
and systematic sampling 

and how to identify a 

biased sample 


INSIGHT 


In an observational study, 
a researcher does not 
influence the responses. 
In an experiment, a 
researcher deliberately 
applies a treatment 
before observing 

the responses. 


Data Collection and Experimental Design 


Design of a Statistical Study >» Data Collection » Experimental Design 
>» Sampling Techniques 


> DESIGN OF A STATISTICAL STUDY 


The goal of every statistical study is to collect data and then use the data to make 
a decision. Any decision you make using the results of a statistical study is only 
as good as the process used to obtain the data. If the process is flawed, then the 
resulting decision is questionable. 

Although you may never have to develop a statistical study, it is likely that 
you will have to interpret the results of one. And before you interpret the results 
of a study, you should determine whether the results are valid, as well as reliable. 
In other words, you should be familiar with how to design a statistical study. 


GUIDELINES 


Designing a Statistical Study 


1. Identify the variable(s) of interest (the focus) and the population 
of the study. 


2. Develop a detailed plan for collecting data. If you use a sample, make 
sure the sample is representative of the population. 


3. Collect the data. 
4. Describe the data, using descriptive statistics techniques. 


5. Interpret the data and make decisions about the population using 
inferential statistics. 


6. Identify any possible errors. 


> DATA COLLECTION 


There are several ways you can collect data. Often, the focus of the study dictates 
the best way to collect data. The following is a brief summary of four methods of 
data collection. 


¢ Doan observational study In an observational study, a researcher observes 
and measures characteristics of interest of part of a population but does 
not change existing conditions. For instance, an observational study was 
performed in which researchers observed and recorded the mouthing 
behavior on nonfood objects of children up to three years old. (Source: 
Pediatrics Magazine) 


e Perform an experiment In performing an experiment, a treatment is applied 
to part of a population and responses are observed. Another part of the 
population may be used as a control group, in which no treatment is applied. 
In many cases, subjects (sometimes called experimental units) in the control 
group are given a placebo, which is a harmless, unmedicated treatment, that 
is made to look like the real treatment. The responses of the treatment group 
and control group can then be compared and studied. In most cases, it is a 
good idea to use the same number of subjects for each treatment. For 
instance, an experiment was performed in which diabetics took cinnamon 
extract daily while a control group took none. After 40 days, the diabetics 
who took the cinnamon reduced their risk of heart disease while the control 
group experienced no change. (Source: Diabetes Care) 
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The Gallup Organization conducts 
many polls (or surveys) regarding 
the president, Congress, and 
political and nonpolitical issues. 
A commonly cited Gallup poll 

is the public approval rating of 
the president. For instance, the 
approval ratings for President 
Barack Obama throughout 2009 
are shown in the following 
graph. (The rating is from the 
poll conducted at the end of 
each month.) 


President’s Approval 
Ratings, 2009 


Percent approving 


Discuss some ways that 
Gallup could select a biased 
sample to conduct a poll. How 
could Gallup select a sample 
that is unbiased? 
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Use a simulation A simulation is the use of a mathematical or physical 
model to reproduce the conditions of a situation or process. Collecting data 
often involves the use of computers. Simulations allow you to study situations 
that are impractical or even dangerous to create in real life, and often they 
save time and money. For instance, automobile manufacturers use simulations 
with dummies to study the effects of crashes on humans. Throughout this 
course, you will have the opportunity to use applets that simulate statistical 
processes on a computer. 


Use a survey A survey is an investigation of one or more characteristics 
of a population. Most often, surveys are carried out on people by asking 
them questions. The most common types of surveys are done by interview, 
mail, or telephone. In designing a survey, it is important to word the questions 
so that they do not lead to biased results, which are not representative 
of a population. For instance, a survey is conducted on a sample of female 
physicians to determine whether the primary reason for their career choice 
is financial stability. In designing the survey, it would be acceptable to 
make a list of reasons and ask each individual in the sample to select her 
first choice. 


EXAMPLE 1 


>» Deciding on Methods of Data Collection 


Consider the following statistical studies. Which method of data collection 
would you use to collect data for each study? Explain your reasoning. 


1. A study of the effect of changing flight patterns on the number of airplane 
accidents 


2. A study of the effect of eating oatmeal on lowering blood pressure 
3. A study of how fourth grade students solve a puzzle 


4. A study of US. residents’ approval rating of the U.S. president 


> Solution 
1. Because it is impractical to create this situation, use a simulation. 


2. In this study, you want to measure the effect a treatment (eating oatmeal) 
has on patients. So, you would want to perform an experiment. 


3. Because you want to observe and measure certain characteristics of part of 
a population, you could do an observational study. 


4. You could use a survey that asks, “Do you approve of the way the president 
is handling his job?” 


> Try It Yourself 1 


Consider the following statistical studies. Which method of data collection 
would you use to collect data for each study? 


1. A study of the effect of exercise on relieving depression 


2. A study of the success of graduates of a large university in finding a job 
within one year of graduation 


a. Identify the focus of the study. 
b. Identify the population of the study. 


c. Choose an appropriate method of data collection. Answer: Page A30 
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INSIGHT 


The Hawthorne effect 
occurs in an experiment 
when subjects change 
their behavior simply 
because they know 
they are participating 
in an experiment. 


30-39 
years old 
40-49 
years old 
Over 50 
years old 


Randomized Block Design 


>» EXPERIMENTAL DESIGN 


In order to produce meaningful unbiased results, experiments should be carefully 
designed and executed. It is important to know what steps should be taken to 
make the results of an experiment valid. Three key elements of a well-designed 
experiment are control, randomization, and replication. 

Because experimental results can be ruined by a variety of factors, being able 
to control these influential factors is important. One such factor is a confounding 
variable. 


DEFINITION 


A confounding variable occurs when an experimenter cannot tell the 
difference between the effects of different factors on a variable. 


For instance, to attract more customers, a coffee shop owner experiments by 
remodeling her shop using bright colors. At the same time, a shopping mall 
nearby has its grand opening. If business at the coffee shop increases, it cannot be 
determined whether it is because of the new colors or the new shopping mall. The 
effects of the colors and the shopping mall have been confounded. 

Another factor that can affect experimental results is the placebo effect. The 
placebo effect occurs when a subject reacts favorably to a placebo when in fact 
the subject has been given no medicated treatment at all. To help control or 
minimize the placebo effect, a technique called blinding can be used. 


DEFINITION 


Blinding is a technique where the subjects do not know whether they are 
receiving a treatment or a placebo. In a double-blind experiment, neither 
the experimenter nor the subjects know if the subjects are receiving a 
treatment or a placebo. The experimenter is informed after all the data 
have been collected. This type of experimental design is preferred by 
researchers. 


Another technique that can be used to obtain unbiased results is 
randomization. 


DEFINITION 


Randomization is a process of randomly assigning subjects to different 
treatment groups. 


In a completely randomized design, subjects are assigned to different 
treatment groups through random selection. In some experiments, it may be 
necessary for the experimenter to use blocks, which are groups of subjects with 
similar characteristics. A commonly used experimental design is a randomized 
block design. To use a randomized block design, you should divide subjects with 
similar characteristics into blocks, and then, within each block, randomly assign 
subjects to treatment groups. For instance, an experimenter who is testing the 
effects of a new weight loss drink may first divide the subjects into age categories 
such as 30-39 years old, 40—49 years old, and over 50 years old, and then, within 
each age group, randomly assign subjects to either the treatment group or the 
control group as shown. 
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INSIGHT 


The validity of an 
experiment refers to the 
accuracy and reliability 
of the experimental 
results. The results of 

a valid experiment 

are more likely to 

be accepted in the 
scientific community. 
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Another type of experimental design is a matched-pairs design, where 
subjects are paired up according to a similarity. One subject in the pair is 
randomly selected to receive one treatment while the other subject receives a 
different treatment. For instance, two subjects may be paired up because of their 
age, geographical location, or a particular physical characteristic. 

Sample size, which is the number of subjects, is another important part of 
experimental design. To improve the validity of experimental results, replication 
is required. 


DEFINITION 


Replication is the repetition of an experiment under the same or similar 
conditions. 


For instance, suppose an experiment is designed to test a vaccine against a 
strain of influenza. In the experiment, 10,000 people are given the vaccine and 
another 10,000 people are given a placebo. Because of the sample size, the 
effectiveness of the vaccine would most likely be observed. But, if the subjects in 
the experiment are not selected so that the two groups are similar (according to 
age and gender), the results are of less value. 


EXAMPLE 2 


> Analyzing an Experimental Design 


A company wants to test the effectiveness of a new gum developed to help 
people quit smoking. Identify a potential problem with the given experimental 
design and suggest a way to improve it. 


1. The company identifies ten adults who are heavy smokers. Five of the 
subjects are given the new gum and the other five subjects are given a 
placebo. After two months, the subjects are evaluated and it is found that 
the five subjects using the new gum have quit smoking. 


2. The company identifies one thousand adults who are heavy smokers. The 
subjects are divided into blocks according to gender. Females are given the 
new gum and males are given the placebo. After two months, a significant 
number of the female subjects have quit smoking. 


> Solution 


1. The sample size being used is not large enough to validate the results of the 
experiment. The experiment must be replicated to improve the validity. 


2. The groups are not similar. The new gum may have a greater effect on 
women than on men, or vice versa. The subjects can be divided into blocks 
according to gender, but then, within each block, they must be randomly 
assigned to be in the treatment group or in the control group. 


> Try It Yourself 2 


Using the information in Example 2, suppose the company identifies 
240 adults who are heavy smokers. The subjects are randomly assigned to be 
in a treatment group or in a control group. Each subject is also given a DVD 
featuring the dangers of smoking. After four months, most of the subjects 
in the treatment group have quit smoking. 


a. Identify a potential problem with the experimental design. 
b. How could the design be improved? Answer: Page A30 
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INSIGHT 


A biased sample is one that is not 
representative of the population 
from which it is drawn. 
For instance, a sample 
consisting of only 

18- to 22-year-old 
college students would 
not be representative 
of the entire 18- to 
22-year-old population 
in the country. 


»» To explore this topic further, 
see Activity 1.3 on page 26. 


STUDY TIP 

Here are instructions for using the 
random integer generator on a 
TI-83/84 Plus for Example 3. 


MATH 
Choose the PRB menu. 
5: randint( 
1][,] [7] [3] [a] [.] [8] 
ENTER 


randInt¢is 31,83 
C357 35 249 P28... 


Continuing to press 
ENTER| will generate 
more random samples 
of 8 integers. 


| _ 


Presented by: https://jafrilibrary.org 


INTRODUCTION TO STATISTICS 


> SAMPLING TECHNIQUES 


A census is a count or measure of an entire population. Taking a census provides 
complete information, but it is often costly and difficult to perform. A sampling 
is a count or measure of part of a population, and is more commonly used in 
statistical studies. To collect unbiased data, a researcher must ensure that the 
sample is representative of the population. Appropriate sampling techniques 
must be used to ensure that inferences about the population are valid. Remember 
that when a study is done with faulty data, the results are questionable. Even with 
the best methods of sampling, a sampling error may occur. A sampling error is the 
difference between the results of a sample and those of the population. When you 
learn about inferential statistics, you will learn techniques of controlling sampling 
errors. 

A random sample is one in which every member of the population has an 
equal chance of being selected. A simple random sample is a sample in which 
every possible sample of the same size has the same chance of being selected. 
One way to collect a simple random sample is to assign a different number to 
each member of the population and then use a random number table like the one 
in Appendix B. Responses, counts, or measures for members of the population 
whose numbers correspond to those generated using the table would be in the 
sample. Calculators and computer software programs are also used to generate 
random numbers (see page 34). 


Table 1—Random Numbers 


92630 78240 19267 95457 53497 23894 37708 79862 
79445 78735 71549 44843 26104 67318 00701 34986 
59654 71966 27386 50004 05358 94031 29281 18544 
31524 49587 76612 39789 13537 48086 59483 60680 
06348 76938 90379 51392 55887 71015 09209 79157 


Portion of Table 1 found in Appendix B 


Consider a study of the number of people who live in West Ridge County. To use 
a simple random sample to count the number of people who live in West Ridge 
County households, you could assign a different number to each household, use 
a technology tool or table of random numbers to generate a sample of numbers, 
and then count the number of people living in each selected household. 


EXAMPLE 3 G® Report 1 


>» Using a Simple Random Sample 


There are 731 students currently enrolled in a statistics course at your school. 
You wish to form a sample of eight students to answer some survey questions. 
Select the students who will belong to the simple random sample. 


> Solution 


Assign numbers 1 to 731 to the students in the course. In the table of random 
numbers, choose a starting place at random and read the digits in groups of 
three (because 731 is a three-digit number). For instance, if you started in the 
third row of the table at the beginning of the second column, you would group 
the numbers as follows: 


719|66 2|738|6 50|004| 053|58 9|403|1 29|281| 185|44 


Ignoring numbers greater than 731, the first eight numbers are 719, 662, 650, 4,53, 
589, 403, and 129. The students assigned these numbers will make up the sample. 
To find the sample using a TI-83/84 Plus, follow the instructions in the margin. 
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INSIGHT 


For stratified sampling, each 
of the strata contains members 
with a certain characteristic (for 


instance, a particular age group). 


In contrast, clusters consist of 
geographic groupings, and each 
cluster should contain members 
with all of the characteristics 
(for instance, all age 

groups). With stratified 
samples, some of the 

members of each group 

are used. In a cluster 

sampling, all of the 

members of one or 

more groups are used. 
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> Try It Yourself 3 
A company employs 79 people. Choose a simple random sample of five to survey. 


a. In the table in Appendix B, randomly choose a starting place. 
b. Read the digits in groups of two. 
c. Write the five random numbers. Answer: Page A30 


When you choose members of a sample, you should decide whether it is 


acceptable to have the same population member selected more than once. If it 
is acceptable, then the sampling process is said to be with replacement. If it is not 
acceptable, then the sampling process is said to be without replacement. 


There are several other commonly used sampling techniques. Each has 


advantages and disadvantages. 


Stratified Sample When it is important for the sample to have members 
from each segment of the population, you should use a stratified sample. 
Depending on the focus of the study, members of the population are divided 
into two or more subsets, called strata, that share a similar characteristic such 
as age, gender, ethnicity, or even political preference. A sample is then 
randomly selected from each of the strata. Using a stratified sample ensures 
that each segment of the population is represented. For instance, to collect a 
stratified sample of the number of people who live in West Ridge County 
households, you could divide the households into socioeconomic levels, and 
then randomly select households from each level. 


0G. 0 
@o%@a 4 Oe Ao@ 
Group 1: Group 2: Group 3: 
Low income Middle income High income 


Stratified Sampling 


Cluster Sample When the population falls into naturally occurring 
subgroups, each having similar characteristics, a cluster sample may be the 
most appropriate. To select a cluster sample, divide the population into 
groups, called clusters, and select all of the members in one or more (but not 
all) of the clusters. Examples of clusters could be different sections of the 
same course or different branches of a bank. For instance, to collect a cluster 
sample of the number of people who live in West Ridge County households, 
divide the households into groups according to zip codes, then select all the 
households in one or more, but not all, zip codes and count the number of 
people living in each household. In using a cluster sample, care must be taken 
to ensure that all clusters have similar characteristics. For instance, if one of 
the zip code clusters has a greater proportion of high-income people, the data 
might not be representative of the population. 


Zip Code Zones in West Ridge County 


Cluster Sampling 
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Systematic Sample A systematic sample is a sample in which each member 
of the population is assigned a number. The members of the population are 
ordered in some way, a starting number is randomly selected, and then 
sample members are selected at regular intervals from the starting number. 
(For instance, every 3rd, 5th, or 100th member is selected.) For instance, to 
collect a systematic sample of the number of people who live in West Ridge 
County households, you could assign a different number to each household, 
randomly choose a starting number, select every 100th household, and count 
the number of people living in each. An advantage of systematic sampling is 
that it is easy to use. In the case of any regularly occurring pattern in the data, 
however, this type of sampling should be avoided. 


1@d 4@6 4@4 4@aeOca 


Systematic Sampling 


A type of sample that often leads to biased studies (so it is not recommended) 


is a convenience sample. A convenience sample consists only of available members 
of the population. 


EXAMPLE 4 


> Identifying Sampling Techniques 

You are doing a study to determine the opinions of students at your school 
regarding stem cell research. Identify the sampling technique you are using if 
you select the samples listed. Discuss potential sources of bias (if any). Explain. 


1. You divide the student population with respect to majors and randomly 
select and question some students in each major. 


2. You assign each student a number and generate random numbers. You then 
question each student whose number is randomly selected. 


3. You select students who are in your biology class. 


> Solution 


1. Because students are divided into strata (majors) and a sample is selected 
from each major, this is a stratified sample. 


2. Each sample of the same size has an equal chance of being selected and 
each student has an equal chance of being selected, so this is a simple 
random sample. 


3. Because the sample is taken from students that are readily available, this is 
a convenience sample. The sample may be biased because biology students 
may be more familiar with stem cell research than other students and may 
have stronger opinions. 


> Try It Yourself 4 


You want to determine the opinions of students regarding stem cell research. 
Identify the sampling technique you are using if you select the samples listed. 


1. You select a class at random and question each student in the class. 


2. You assign each student a number and, after choosing a starting number, 
question every 25th student. 


a. Determine how the sample is selected and identify the corresponding 
sampling technique. 
b. Discuss potential sources of bias (if any). Explain. Answer: Page A30 
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a 
1. 
2. 
3. 


4. 


BUILDING BASIC SKILLS AND VOCABULARY 


What is the difference between an observational study and an experiment? 
What is the difference between a census and a sampling? 


What is the difference between a random sample and a simple random 
sample? 


What is replication in an experiment, and why is it important? 


True or False? Jn Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. 


10. 


In a randomized block design, subjects with similar characteristics are 
divided into blocks, and then, within each block, randomly assigned to 
treatment groups. 


. A double-blind experiment is used to increase the placebo effect. 


. Using a systematic sample guarantees that members of each group within a 


population will be sampled. 


. A census is a count of part of a population. 


. The method for selecting a stratified sample is to order a population in some 


way and then select members of the population at regular intervals. 


To select a cluster sample, divide a population into groups and then select all 
of the members in at least one (but not all) of the groups. 


Deciding on the Method of Data Collection = In Exercises 11-16, explain 
which method of data collection you would use to collect data for the study. 


11. 
12. 
13. 


14. 


15. 
16. 


17. 


A study of the health of 168 kidney transplant patients at a hospital 
A study of motorcycle helmet usage in a city without a helmet law 


A study of the effect on the human digestive system of potato chips made 
with a fat substitute 


A study of the effect of a product’s warning label to determine whether 
consumers will still buy the product 


A study of how fast a virus would spread in a metropolitan area 


A study of how often people wash their hands in public restrooms 


USING AND INTERPRETING CONCEPTS 


Allergy Drug A pharmaceutical company wants to test the effectiveness of 
a new allergy drug. The company identifies 250 females 30-35 years old who 
suffer from severe allergies. The subjects are randomly assigned into two 
groups. One group is given the new allergy drug and the other is given a 
placebo that looks exactly like the new allergy drug. After six months, the 
subjects’ symptoms are studied and compared. 


(a) Identify the experimental units and treatments used in this experiment. 


(b) Identify a potential problem with the experimental design being used 
and suggest a way to improve it. 


(c) How could this experiment be designed to be double-blind? 
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18. 


Sneakers Nike developed a new type of sneaker designed to help delay the 
onset of arthritis in the knee. Eighty people with early signs of arthritis 
volunteered for a study. One-half of the volunteers wore the experimental 
sneakers and the other half wore regular Nike sneakers that looked exactly 
like the experimental sneakers. The individuals wore the sneakers every day. 
At the conclusion of the study, their symptoms were evaluated and MRI tests 
were performed on their knees. (Source: Washington Post) 


(a) Identify the experimental units and treatments used in this experiment. 


(b) Identify a potential problem with the experimental design being used 
and suggest a way to improve it. 

(c) The experiment is described as a placebo-controlled, double-blind study. 
Explain what this means. 

(d) Of the 80 volunteers, suppose 40 are men and 40 are women. How could 
blocking be used in designing this experiment? 


Identifying Sampling Techniques In Exercises 19-26, identify the sampling 
technique used, and discuss potential sources of bias (if any). Explain. 


19, 


20. 


21. 


22. 


23. 


24. 
25. 


26. 


27. 


28. 


29. 


30. 


Using random digit dialing, researchers call 1400 people and ask what 
obstacles (such as childcare) keep them from exercising. 


Chosen at random, 500 rural and 500 urban persons age 65 or older are asked 
about their health and their experience with prescription drugs. 


Questioning students as they leave a university library, a researcher asks 
358 students about their drinking habits. 


After a hurricane, a disaster area is divided into 200 equal grids. Thirty of the 
grids are selected, and every occupied household in the grid is interviewed to 
help focus relief efforts on what residents require the most. 


Chosen at random, 580 customers at a car dealership are contacted and 
asked their opinions of the service they received. 


Every tenth person entering a mall is asked to name his or her favorite store. 


Soybeans are planted on a 48-acre field. The field is divided into one-acre 
subplots. A sample is taken from each subplot to estimate the harvest. 


From calls made with randomly generated telephone numbers, 1012 
respondents are asked if they rent or own their residences. 


Random Number Table Use the seventh row of Table 1 in Appendix B to 
generate 12 random numbers between 1 and 99. 


Random Number Table Use the twelfth row of Table 1 in Appendix B to 
generate 10 random numbers between 1 and 920. 


Sleep Deprivation A researcher wants to study the effects of sleep 
deprivation on motor skills. Eighteen people volunteer for the experiment: 
Jake, Maria, Mike, Lucy, Ron, Adam, Bridget, Carlos, Steve, Susan, Vanessa, 
Rick, Dan, Kate, Pete, Judy, Mary, and Connie. Use a random number 
generator to choose nine subjects for the treatment group. The other nine 
subjects will go into the control group. List the subjects in each group. Tell 
which method you would use to generate the random numbers. 


Random Number Generation Volunteers for an experiment are numbered 
from 1 to 70. The volunteers are to be randomly assigned to two different 
treatment groups. Use a random number generator different from the one you 
used in Exercise 29 to choose 35 subjects for the treatment group. The other 35 
subjects will go into the control group. List the subjects, according to number, 
in each group. Tell which method you used to generate the random numbers. 
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Choosing Between a Census and a Sampling Jn Exercises 31 and 32, 
determine whether you would take a census or use a sampling. If you would 
use a sampling, decide what sampling technique you would use. Explain your 
reasoning. 


31. The average age of the 115 residents of a retirement community 


32. The most popular type of movie among 100,000 online movie rental subscribers 


Recognizing a Biased Question In Exercises 33-36, determine whether the 
survey question is biased. If the question is biased, suggest a better wording. 


33. Why does eating whole-grain foods improve your health? 
34. Why does text messaging while driving increase the risk of a crash? 
35. How much do you exercise during an average week? 


36. Why do you think the media have a negative effect on teen girls’ dieting 
habits? 


37. Writing A sample of television program ratings by The Nielsen Company 
is described on page 15. Discuss the strata used in the sample. Why is it 
important to have a stratified sample for these ratings? 

38. Use StatCrunch to generate the following random numbers. 

a. 8 numbers between 1 and 50 

b. 15 numbers between 1 and 150 
c. 16 numbers between 1 and 325 
d. 20 numbers between 1 and 1000 


M@ EXTENDING CONCEPTS 


39. Observational studies are sometimes referred to as natural experiments. 
Explain, in your own words, what this means. 


40. Open and Closed Questions Two types of survey questions are open 
questions and closed questions. An open question allows for any kind of 
response; a closed question allows for only a fixed response. An open 
question, and a closed question with its possible choices, are given below. List 
an advantage and a disadvantage of each question. 

Open Question What can be done to get students to eat healthier foods? 
Closed Question How would you get students to eat healthier foods? 
1. Mandatory nutrition course 
2. Offer only healthy foods in the cafeteria and remove unhealthy foods 
3. Offer more healthy foods in the cafeteria and raise the prices on 
unhealthy foods 


41. Who Picked These People? Some polling agencies ask people to call 
a telephone number and give their response to a question. (a) List an 
advantage and a disadvantage of a survey conducted in this manner. 
(b) What sampling technique is used in such a survey? 


42. Give an example of an experiment where confounding may occur. 
43. Why is it important to use blinding in an experiment? 


44. How are the placebo effect and the Hawthorne effect similar? How are they 
different? 


45. Howisa randomized block design in experiments similar to a stratified sample? 
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Random Numbers 


The random numbers applet is designed to allow you to generate random 
numbers from a range of values. You can specify integer values for the minimum 
value, maximum value, and the number of samples in the appropriate fields. You 
should not use decimal points when filling in the fields. When SAMPLE is clicked, 
the applet generates random values, which are displayed as a list in the text field. 


Minimum value: 


Maximum value: 


Number of samples: 


Sample 


= Explore 


Step 1 Specify a minimum value. 

Step 2 Specify a maximum value. 

Step 3 Specify the number of samples. 

Step 4 Click SAMPLE to generate a list of random values. 


= Draw Conclusions 


1. Specify the minimum, maximum, and number of samples to be 1, 20, and 8, 
respectively, as shown. Run the applet. Continue generating lists until you 
obtain one that shows that the random sample is taken with replacement. 
Write down this list. How do you know that the list is a random sample taken 
with replacement? 


Minimum value: 1 


Maximum value: 20 


Number of samples: | 8 


Sample 


2. Use the applet to repeat Example 3 on page 20. What values did you use for 
the minimum, maximum, and number of samples? Which method do you 
prefer? Explain. 


. A 
~  ie 


USES AND ABUSES =:"3) 


. 


- BN, 
Uses FP 
Experiments with Favorable Results An experiment that began | 
in March 2003 studied 321 women with advanced breast cancer. All of the 
women had been previously treated with other drugs, but the cancer had stopped 
responding to the medications. The women were then given the opportunity to 
take a new drug combined with a particular chemotherapy drug. 

The subjects were divided into two groups, one that took the new drug 
combined with a chemotherapy drug, and one that took only the chemotherapy 
drug. After three years, results showed that the new drug in combination with 
the chemotherapy drug delayed the progression of cancer in the subjects. The 
results were so significant that the study was stopped, and the new drug was 
offered to all women in the study. The Food and Drug Administration has 
since approved use of the new drug in conjunction with a chemotherapy drug. 


Abuses 


Experiments with Unfavorable Results From 1988 to 1991, one hundred 
eighty thousand teenagers in Norway were used as subjects to test a new vaccine 
against the deadly bacteria meningococcus b. A brochure describing the possi- 
ble effects of the vaccine stated, “it is unlikely to expect serious complications,” 
while information provided to the Norwegian Parliament stated, “serious side 
effects can not be excluded.” The vaccine trial had some disastrous results: More 
than 500 side effects were reported, with some considered serious, and several 
of the subjects developed serious neurological diseases. The results showed that 
the vaccine was providing immunity in only 57% of the cases. This result was 
not sufficient for the vaccine to be added to Norway’s vaccination program. 
Compensations have since been paid to the vaccine victims. 


Ethics 


Experiments help us further understand the world that surrounds us. But, in 
some cases, they can do more harm than good. In the Norwegian experiments, 
several ethical questions arise. Was the Norwegian experiment unethical if the 
best interests of the subjects were neglected? When should the experiment 
have been stopped? Should it have been conducted at all? If serious side 
effects are not reported and are withheld from subjects, there is no ethical 
question here, it is just wrong. 

On the other hand, the breast cancer researchers would not want to deny 
the new drug to a group of patients with a life-threatening disease. But again, 
questions arise. How long must a researcher continue an experiment that 
shows better-than-expected results? How soon can a researcher conclude a 
drug is safe for the subjects involved? 


Mi EXERCISES 


1. Unfavorable Results Find an example of a real-life experiment that had 
unfavorable results. What could have been done to avoid the outcome of 
the experiment? 


2. Stopping an Experiment In your opinion, what are some problems that 
may arise if clinical trials of a new experimental drug or vaccine are 
stopped early and then the drug or vaccine is distributed to other subjects 
or patients? 
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i) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 1.1 
= How to distinguish between a population and a sample 1 1-4 
= How to distinguish between a parameter and a statistic 2 5-8 
= How to distinguish between descriptive statistics and inferential statistics 3 9, 10 
Section 1.2 
= How to distinguish between qualitative data and quantitative data 1 11-16 
= How to classify data with respect to the four levels of measurement: 2,3 17-20 


nominal, ordinal, interval, and ratio 


Section 1.3 


= How data are collected: by doing an observational study, performing an I 21-24 
experiment, using a simulation, or using a survey 


= How to design an experiment 2 25, 26 


= How to create a sample using random sampling, simple random sampling, 3,4 27-34 
stratified sampling, cluster sampling, and systematic sampling 


= How to identify a biased sample 4 35-38 
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ED REVIEW EXERCISES 


M@ SECTION 1.1 


In Exercises 1-4, identify the population and the sample. 


1. 


2. 


3. 


4. 


A survey of 1000 U.S. adults found that 83% think credit cards tempt people 
to buy things they cannot afford. (Source: Rasmussen Reports) 


Thirty-eight nurses working in the San Francisco area were surveyed 
concerning their opinions of managed health care. 


A survey of 39 credit cards found that the average annual percentage rate 
(APR) is 12.83%. (Source: Consumer Action) 


A survey of 1205 physicians found that about 60% had considered leaving 
the practice of medicine because they were discouraged over the state of U.S. 
health care. (Source: The Physician Executive Journal of Medical Management) 


In Exercises 5-8, determine whether the numerical value describes a parameter or 
a Statistic. 


5. 


10. 


The 2009 team payroll of the Philadelphia Phillies was $113,004,046. (Source: 
USA Today) 


. In a survey of 752 adults in the United States, 42% think there should be a 


law that prohibits people from talking on cell phones in public places. 
(Source: University of Michigan) 


. Ina recent study of math majors at a university, 10 students were minoring in 


physics. 


. Fifty percent of a sample of 1508 U.S. adults say they oppose drilling for oil 


and gas in the Arctic National Wildlife Refuge. (Source: Pew Research Center) 


. Which part of the study described in Exercise 3 represents the descriptive 


branch of statistics? Make an inference based on the results of the study. 


Which part of the survey described in Exercise 4 represents the descriptive 
branch of statistics? Make an inference based on the results of the survey. 


M@ SECTION 1.2 


In Exercises 11-16, determine which data are qualitative data and which are 
quantitative data. Explain your reasoning. 


11. 
12. 
13. 
14. 
15. 
16. 


The monthly salaries of the employees at an accounting firm 

The Social Security numbers of the employees at an accounting firm 
The ages of a sample of 350 employees of a software company 

The zip codes of a sample of 350 customers at a sporting goods store 
The 2010 revenues of the companies on the Fortune 500 list 


The marital statuses of all professional golfers 


In Exercises 17-20, identify the data set’s level of measurement. Explain your 
reasoning. 


17. 


18. 


The daily high temperatures (in degrees Fahrenheit) for Mohave, Arizona for 
a week in June are listed. (Source: Arizona Meteorological Network) 


93 91 86 94 103 104 103 
The levels of the Homeland Security Advisory System are listed. 
Severe High Elevated Guarded Low 
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19. The four departments of a printing company are listed. 


Administration Sales Production Billing 


20. The total compensations (in millions of dollars) of the top ten female CEOs 
in the United States are listed. (Source: Forbes) 


94 53 11.8 11.1 94 41 66 5.7 46 4.5 


M@ SECTION 1.3 


In Exercises 21-24, decide which method of data collection you would use to 
collect data for the study. Explain your reasoning. 


21. A study of charitable donations of the CEOs in Syracuse, New York 

22. A study of the effect of koalas on the ecosystem of Kangaroo Island, Australia 
23. A study of how training dogs from animal shelters affects inmates at a prison 
24. A study of college professors’ opinions on teaching classes online 

In Exercises 25 and 26, an experiment is being performed to test the effects of sleep 
deprivation on memory recall. Two hundred students volunteer for the experiment. 


The students will be placed in one of five different treatment groups, including the 
control group. 


25. Explain how you could design an experiment so that it uses a randomized 
block design. 


26. Explain how you could design an experiment so that it uses a completely 
randomized design. 


27. Random Number Table Use the fifth row of Table 1 in Appendix B to 
generate 8 random numbers between 1 and 650. 


28. Census or Sampling? You want to know the favorite spring break 
destination among 15,000 students at a university. Decide whether you would 
take a census or use a sampling. If you would use a sampling, decide what 
technique you would use. Explain your reasoning. 


In Exercises 29-34, identify the sampling technique used in the study. Explain your 
reasoning. 


29. Using random digit dialing, researchers ask 1003 U.S. adults their plans on 
working during retirement. (Source: Princeton Survey Research Associates 
International) 


30. A student asks 18 friends to participate in a psychology experiment. 


31. A pregnancy study in Cebu, Philippines randomly selects 33 communities from 
the Cebu metropolitan area, then interviews all available pregnant women in 
these communities. (Adapted from Cebu Longitudinal Health and Nutrition Survey) 


32. Law enforcement officials stop and check the driver of every third vehicle for 
blood alcohol content. 


33. Twenty-five students are randomly selected from each grade level at a high 
school and surveyed about their study habits. 


34. A journalist interviews 154 people waiting at an airport baggage claim and 
asks them how safe they feel during air travel. 


In Exercises 35-38, identify a bias or error that might occur in the indicated survey 


or study. 
35. study in Exercise 29 36. experiment in Exercise 30 
37. study in Exercise 31 38. sampling in Exercise 32 
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PD cuapter Quiz 


CHAPTER QUIZ 31 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. 


Identify the population and the sample in the following study. 


A study of the dietary habits of 20,000 men was conducted to find a link 
between high intakes of dairy products and prostate cancer. (Source: Harvard 
School of Public Health) 


. Determine whether the numerical value is a parameter or a statistic. 


(a) In a survey of 2253 Internet users, 19% use Twitter or another service to 
share social updates. (Source: Pew Internet Project) 

(b) At a college, 90% of the Board of Trustees members approved the 
contract of the new president. 

(c) A survey of 846 chief financial officers and senior comptrollers shows 
that 55% of U.S. companies are reducing bonuses. (Source: Grant Thornton 
International) 


. Determine whether the data are qualitative or quantitative. 


(a) A list of debit card pin numbers 
(b) The final scores on a video game 


. Identify each data set’s level of measurement. Explain your reasoning. 


(a) A list of badge numbers of police officers at a precinct 

(b) The horsepowers of racing car engines 

(c) The top 10 grossing films released in 2010 

(d) The years of birth for the runners in the Boston marathon 


. Decide which method of data collection you would use to gather data for each 


study. Explain your reasoning. 


(a) A study on the effect of low dietary intake of vitamin C and iron on lead 
levels in adults 


(b) The ages of people living within 500 miles of your home 


. An experiment is being performed to test the effects of a new drug on high 


blood pressure. The experimenter identifies 320 people ages 35-50 years old 
with high blood pressure for participation in the experiment. The subjects are 
divided into equal groups according to age. Within each group, subjects are 
then randomly selected to be in either the treatment group or the control 
group. What type of experimental design is being used for this experiment? 


. Identify the sampling technique used in each study. Explain your reasoning. 


(a) A journalist goes to a campground to ask people how they feel about air 
pollution. 

(b) For quality assurance, every tenth machine part is selected from an 
assembly line and measured for accuracy. 

(c) Astudy on attitudes about smoking is conducted at a college. The students 
are divided by class (freshman, sophomore, junior, and senior). Then a 
random sample is selected from each class and interviewed. 


. Which sampling technique used in Exercise 7 could lead to a biased study? 
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“ oe - Real Statistics — Real Decisions 


You are a researcher for a professional research firm. Your firm has 
won a contract to do a study for an air travel industry publication. The 
editors of the publication would like to know their readers’ thoughts 
on air travel factors such as ticket purchase, services, safety, comfort, 
economic growth, and security. They would also like to know the 
thoughts of adults who use air travel for business as well as for 
recreation. 

The editors have given you their readership database and 20 
questions they would like to ask (two sample questions from a previous How did you acquire your ticket? 
study are given at the right). You know that it is too expensive to 


; Response Percent 

contact all of the readers, so you need to determine a way to contact a 
representative sample of the entire readership population. Travel agent 35.1% 
Directly from airline 20.9% 


Online, using the airline’s website 21.0% 


1. How Would You Do It? Online, from a travel site other 18.5% 
than the airline . 


What ling techni Id to select th le f 
(a) at sampling technique would you use to select the sample for |, ia 


the study? Why? 


(b) Will the technique you choose in part (a) give you a sample that 
is representative of the population? 


(Source: Resource Systems Group) 


(c) Describe the method for collecting data. 


(d) Identify possible flaws or biases in your study. How many associates, friends, or family 
members traveled together in your party? 
2. Data Classification 


eas R Percent 

(a) What type of data do you expect to collect: qualitative, saul 

quantitative, or both? Why? 1 (traveled alone) 48.7% 
(b) At what levels of measurement do you think the data in the 2 (traveled with one other person) 29.7% 

study will be? Why? 3 (traveled with 2 others) 7.1% 
(c) Will the data collected for the study represent a population or a 4 (traveled with 3 others) 7.7% 

sample? 5 (traveled with 4 others) 3.0% 
(d) Will the numerical descriptions of the data be parameters 6 or more (traveled with 5 or 3.8% 

or statistics? more others) an 

3. How They Did It (Source: Resource Systems Group) 


When the Resource Systems Group did a similar study, they used 
an Internet survey. They sent out 1000 invitations to participate in 
the survey and received 621 completed surveys. 


(a) Describe some possible errors in collecting data by Internet 
surveys. 


(b) Compare your method for collecting data in Exercise 1 to this 
method. 
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HISTORY OF STATISTICS 


HISTORY OF STATISTICS - TIMELINE 


CONTRIBUTOR TIME 


John Graunt 
(1620-1674) 


Blaise Pascal (1623-1662) |— 
Pierre de Fermat (1601-1665) 


Pierre Laplace 
(1749-1827) 


Carl Friedrich Gauss 
(1777-1855) 


Lambert Quetelet 
(1796-1874) 


Francis Galton 
(1822-1911) 


Karl Pearson; ‘ 
(1857-1936) ' 


William Gosset 
(1876-1937) 


Charles Spearman 
(1863-1945) 


Ronald Fisher 
(1890-1962) 


Frank Wilcoxon 
(1892-1965) 


John Tukey 


(1915-2000) 


David Kendall 
(1918-2007) 


CONTRIBUTION 


Studied records of deaths in London in the early 1600s. The first 
to make extensive statistical observations from massive amounts of 


data (Chapter 2), his work laid the foundation for modern statistics. 


Pascal and Fermat corresponded about basic probability problems 
(Chapter 3)—especially those dealing with gaming and gambling. 


Studied probability (Chapter 3) and is credited with putting 
probability on a sure mathematical footing. 


Studied regression and the method of least squares (Chapter 9) 
through astronomy. In his honor, the normal distribution is 
sometimes called the Gaussian distribution. 


Used descriptive statistics (Chapter 2) to analyze crime and mortality 
data and studied census techniques. Described normal distributions 
(Chapter 5) in connection with human traits such as height. 


Used regression and correlation (Chapter 9) to study genetic 
variation in humans. He is credited with the discovery of the Central 
Limit Theorem (Chapter 5). 


Studied natural selection using correlation (Chapter 9). Formed first 


academic department of statistics and helped develop chi-square 
analysis (Chapter 6). 


Studied process of brewing and developed t-test to correct problems 
connected with small sample sizes (Chapter 6). 


British psychologist who was one of the first to develop intelligence 
testing using factor analysis (Chapter 10). 


Studied biology and natural selection and developed ANOVA 


(Chapter 10), stressed the importance of experimental design 
(Chapter 1), and was the first to identify the null and alternative 


hypotheses (Chapter 7). 


Biochemist who used statistics to study plant pathology. He 
introduced two-sample tests (Chapter 8), which led the way to 
the development of nonparametric statistics. 


Worked at Princeton during World War II. Introduced exploratory 
data analysis techniques such as stem-and-leaf plots (Chapter 2). 
Also, worked at Bell Laboratories and is best known for his work 
in inferential statistics (Chapters 6-11). 


Worked at Princeton and Cambridge. Was a leading authority on 


applied probability and data analysis (Chapters 2 and 3). 
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USING TECHNOLOGY IN STATISTICS 


With large data sets, you will find that calculators or computer software 
programs can help perform calculations and create graphics. Of the 
many calculators and statistical software programs that are available, 
we have chosen to incorporate the TI-83/84 Plus graphing calculator, and 
MINITAB and Excel software into this text. 

The following example shows how to use these three technologies 
to generate a list of random numbers. This list of random numbers can 
be used to select sample members or perform simulations. 


EXAMPLE 


>» Generating a List of Random Numbers 


A quality control department inspects a random sample of 15 of the 
167 cars that are assembled at an auto plant. How should the cars be 
chosen? 


> Solution 


One way to choose the sample is to first number the cars from 1 to 
167. Then you can use technology to form a list of random numbers 
from 1 to 167. Each of the technology tools shown requires different 
steps to generate the list. Each, however, does require that you 
identify the minimum value as 1 and the maximum value as 167. 
Check your user’s manual for specific instructions. 


[excen 
Sia A randint(1, 167, 15) 

= 7 <a (17 42 152 59 5 116 125 
4 187 3 aA 64 122 55 58 GO 82 152 
5 7A [4] 58) 105) 

4 160 [se] 151 

5 18 6 36 

6 70 7 96 

7 80 8) 154 

38 | 56 aa 2 

9 a7 10 113 

ol 6 ie 157 

11 82 \12 103 

12 126 13 64 

13 98 14 135 

14 104 ule) 30 

15 137 
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MINITAB 


TECHNOLOGY 


EXCEL 


Recall that when you generate a list of random numbers, you 
should decide whether it is acceptable to have numbers that repeat. If it 
is acceptable, then the sampling process is said to be with replacement. 
If it is not acceptable, then the sampling process is said to be without 


replacement. 


With each of the three technology tools shown on page 34, you have 
the capability of sorting the list so that the numbers appear in order. 
Sorting helps you see whether any of the numbers in the list repeat. If 
it is not acceptable to have repeats, you should specify that the tool 
generate more random numbers than you need. 


M@ EXERCISES 


1. 


The SEC (Securities and Exchange Commission) 
is investigating a financial services company. 
The company being investigated has 86 
brokers. The SEC decides to review the records 
for a random sample of 10 brokers. Describe 
how this investigation could be done. Then use 


. Use random numbers to simulate rolling a 


six-sided die 60 times. How many times did you 
obtain each number from 1 to 6? Are the 
results what you expected? 


. You rolled a six-sided die 60 times and got the 


: following tally. 
technology to generate a list of 10 random 
numbers from 1 to 86 and order the list. 20 ones 20 twos_15 threes 
3 fours 2 fives 0 sixes 


. A quality control department is testing 


25 smartphones from a shipment of 300 
smartphones. Describe how this test could be 
done. Then use technology to generate a list of 
25 random numbers from 1 to 300 and order 
the list. 


. Consider the population of ten digits: 0, 1, 2, 3, 


4,5, 6,7, 8, and 9. Select three random samples 
of five digits from this list. Find the average of 
each sample. Compare your results with the 
average of the entire population. Comment on 
your results. (Hint: To find the average, sum 
the data entries and divide the sum by the 
number of entries.) 


. Consider the population of 41 whole numbers 


from 0 to 40. What is the average of these 
numbers? Select three random samples of 
seven numbers from this list. Find the average 
of each sample. Compare your results with the 
average of the entire population. Comment on 
your results. (Hint: To find the average, sum 
the data entries and divide the sum by the 
number of entries.) 


Does this seem like a reasonable result? What 
inference might you draw from the result? 


. Use random numbers to simulate tossing a coin 


100 times. Let 0 represent heads, and let 1 
represent tails. How many times did you 
obtain each number? Are the results what you 
expected? 


. You tossed a coin 100 times and got 77 heads 


and 23 tails. Does this seem like a reasonable 
result? What inference might you draw from 
the result? 


. A political analyst would like to survey a 


sample of the registered voters in a county. 
The county has 47 election districts. How could 
the analyst use random numbers to obtain a 
cluster sample? 


35 


TI-83/84 PLUS 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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DESCRIPTIVE 
STATISTICS 


2.1 Frequency Distributions 
and Their Graphs 


2.2 More Graphs and 
Displays 

2.3 Measures of Central 
Tendency 
@ ACTIVITY 

2.4 Measures of Variation 
@ ACTIVITY 
@ CASE STUDY 

2.5 Measures of Position 
@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


@ TECHNOLOGY 


Brothers Sam and Bud Walton opened the 

first Wal-Mart store in 1962. Today, the Walton 
family is one of the richest families in the world. 
Members of the Walton family held four spots 

in the top 50 richest people in the world in 2009. 
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«€ WHERE YOU'VE BEEN 


In Chapter 1, you learned that there are many 
ways to collect data. Usually, researchers must 
work with sample data in order to analyze 
populations, but occasionally it is possible to 
collect all the data for a given population. For 
instance, the following represents the ages of the 
50 richest people in the world in 2009. 


WHERE YOU’RE GOING p> 


89, 89, 87, 86, 86, 85, 83, 83, 82, 81, 80, 78, 78, 77, 
76, 73, 73, 73, 72, 69, 69, 68, 67, 66, 66, 65, 65, 64, 
63, 61, 61, 60, 59, 58, 57, 56, 54, 54, 53, 53, 51, 51, 
49, 47, 46, 44, 43, 42, 36, 35 


In Chapter 2, you will learn ways to organize and 
describe data sets. The goal is to make the data 
easier to understand by describing trends, 
averages, and variations. For instance, in the raw 


} 


Make a frequency 
distribution table. 


35-41 


data showing the ages of the 50 richest people in 
the world in 2009, it is not easy to see any 
patterns or special characteristics. Here are some 
ways you can organize and describe the data. 


- 


Draw a histogram. 


34.5 41.5 48.5 55.5 62.5 69.5 76.5 83.5 90.5 


Age 


2 
iy 
42-48 5 5 
rs 
49-55 7 3 
jad 
56-62 7 
63-69 10 
70-76 5 
771-83 8 
| 84-90 | 6 
AO) ae foh8) aie foiy/ ae tele) ar tele) ae con ae 418) oe ily) oR aio} a= Sip) 
Mean = 
50 
Se 3263 
50 


= 65.26 years old eee crema eae 


Range = 89 — 35 


= 54 years Find how the data vary. 
NS 
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WHAT YOU SHOULD LEARN 


>» How to construct a frequency 
distribution including limits, 
midpoints, relative frequen- 
cies, cumulative frequencies, 
and boundaries 


4 


How to construct frequency 
histograms, frequency 
polygons, relative frequency 
histograms, and ogives 


Example of a 
Frequency Distribution 


5 
8 
11-15 6 
8 
5 
4 


STUDY TIP 


In a frequency distribution, 
it is best if each class has the 
same width. Answers shown 
will use the minimum 
data value for the lower 
limit of the first class. 
Sometimes it may be 
more convenient to 
choose a lower limit 
that is slightly lower + 
than the minimum value. 
The frequency distribution 
produced will vary slightly. 


Frequency Distributions and Their Graphs 


Frequency Distributions > Graphs of Frequency Distributions 


>» FREQUENCY DISTRIBUTIONS 


You will learn that there are many ways to organize and describe a data set. 
Important characteristics to look for when organizing and describing a data set 
are its center, its variability (or spread), and its shape. Measures of center and 
shapes of distributions are covered in Section 2.3. 

When a data set has many entries, it can be difficult to see patterns. In this 
section, you will learn how to organize data sets by grouping the data into 
intervals called classes and forming a frequency distribution. You will also learn 
how to use frequency distributions to construct graphs. 


DEFINITION 


A frequency distribution is a table that shows classes or intervals of data 
entries with a count of the number of entries in each class. The frequency f of 
a class is the number of data entries in the class. 


In the frequency distribution shown at the left there are six classes. The 
frequencies for each of the six classes are 5, 8, 6, 8,5, and 4. Each class has a lower 
class limit, which is the least number that can belong to the class, and an upper 
class limit, which is the greatest number that can belong to the class. In the 
frequency distribution shown, the lower class limits are 1, 6,11, 16,21, and 26, and 
the upper class limits are 5, 10, 15, 20, 25, and 30. The class width is the distance 
between lower (or upper) limits of consecutive classes. For instance, the class 
width in the frequency distribution shown is 6 — 1 = 5. 

The difference between the maximum and minimum data entries is called 
the range. In the frequency table shown, suppose the maximum data entry is 29, 
and the minimum data entry is 1. The range then is 29 — 1 = 28. You will learn 
more about the range of a data set in Section 2.4. 


GUIDELINES 


Constructing a Frequency Distribution from a Data Set 


1. Decide on the number of classes to include in the frequency distribution. 
The number of classes should be between 5 and 20; otherwise, it may be 
difficult to detect any patterns. 

2. Find the class width as follows. Determine the range of the data, divide 
the range by the number of classes, and round up to the next convenient 
number. 

3. Find the class limits. You can use the minimum data entry as the lower 
limit of the first class. To find the remaining lower limits, add the class 
width to the lower limit of the preceding class. Then find the upper limit 
of the first class. Remember that classes cannot overlap. Find the 
remaining upper class limits. 

4, Make a tally mark for each data entry in the row of the appropriate class. 


5. Count the tally marks to find the total frequency f for each class. 
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EXAMPLE 1 


> Constructing a Frequency Distribution from a Data Set 


The following sample data set lists the prices (in dollars) of 30 portable global 
positioning system (GPS) navigators. Construct a frequency distribution that 
has seven classes. 


90 130 400 200 350 70 325 250 150 250 
275 270 150 130 59 200 160 450 300 130 
220 100 200 400 200 250 95 180 170 150 


INSIGHT 

If you obtain a whole number > Solution 

urehucale ulation tesla: 1. The number of classes (7) is stated in the problem. 

width of a frequency 

distribution, use the 2. The minimum data entry is 59 and the maximum data entry is 450, so the 
next whole number A range is 450 — 59 = 391. Divide the range by the number of classes and 
as the class width. round up to find the class width. 

Doing this ensures 

that you will have . 2 aol Range 

enough Speids in Sete SE 7 Number of classes 

frequency distribution ~ 55.86 Keindieniese, 


for all the data values. 
3. The minimum data entry is a convenient lower limit for the first class. To 

find the lower limits of the remaining six classes, add the class width of 56 

to the lower limit of each previous class. The upper limit of the first class 


Lower limit Upper limit _ is 114, which is one less than the lower limit of the second class. The upper 


limits of the other classes are 114 + 56 = 170, 170 + 56 = 226, and so on. 


59 114 ae 
a6 0 The lower and upper limits for all seven classes are shown. 
171 226 4. Make a tally mark for each data entry in the appropriate class. For instance, 
the data entry 130 is in the 115-170 class, so make a tally mark in that class. 
227 282 : ¢ : 
soy “ag Continue until you have made a tally mark for each of the 30 data entries. 
339 304 5. The number of tally marks for a class is the frequency of that class. 
395 450 The frequency distribution is shown in the following table. The first class, 
59-114, has five tally marks. So, the frequency of this class is 5. Notice that the 
sum of the frequencies is 30, which is the number of entries in the sample data 
set. The sum is denoted b , Where > is the uppercase Greek letter sigma. 
STUDY TIP ya 2 PP 8 
The Wel TS eaS Greek Frequency Distribution for 
urea a (2) Is used Prices (in dollars) of GPS Navigators 
t roug out statistics to i me, eo Number of 
indicate a summation GPS navigators 
of values ye Eee ees 
59-114 | Ht 5 
115-170 | JH ||| 8 
171-226 | JH | 6 
227-282 | [LH 5 
283-338 || 2 
339-394 | | 1 Check that the sum 
395-450 ||| a] of the frequencies 
equals the number 
=f = 30 in the sample. 


ae 
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> Try It Yourself 1 


Construct a frequency distribution using the ages of the 50 richest people data 
set listed in the Chapter Opener on page 37. Use eight classes. 


a. State the number of classes. 

b. Find the minimum and maximum values and the class width. 

ce. Find the class limits. 

d. Tally the data entries. 

e. Write the frequency f of each class. Answer: Page A30 


After constructing a standard frequency distribution such as the one in 
Example 1, you can include several additional features that will help provide 
a better understanding of the data. These features (the midpoint, relative 
frequency, and cumulative frequency of each class) can be included as 
additional columns in your table. 


DEFINITION 


The midpoint of a class is the sum of the lower and upper limits of the class 
divided by two. The midpoint is sometimes called the class mark. 


(Lower class limit) + (Upper class limit) 
2 


Midpoint = 


The relative frequency of a class is the portion or percentage of the data that 
falls in that class. To find the relative frequency of a class, divide the frequency 
f by the sample size n. 
Class frequency f 

Sample size =n 


Relative frequency = 


The cumulative frequency of a class is the sum of the frequencies of that class 
and all previous classes. The cumulative frequency of the last class is equal to 
the sample size n. 


After finding the first midpoint, you can find the remaining midpoints by 
adding the class width to the previous midpoint. For instance, if the first midpoint 
is 86.5 and the class width is 56, then the remaining midpoints are 


86.5 + 56 = 142.5 


142.5 + 56 = 198.5 


198.5 + 56 = 254.5 


254.5 + 56 = 310.5 


and so on. 

You can write the relative frequency as a fraction, decimal, or percent. The 
sum of the relative frequencies of all the classes should be equal to 1, or 100%. 
Due to rounding, the sum may be slightly less than or greater than 1. So, values 
such as 0.99 and 1.01 are sufficient. 
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EXAMPLE 2 


> Finding Midpoints, Relative Frequencies, and Cumulative 
Frequencies 

Using the frequency distribution constructed in Example 1, find the midpoint, 

relative frequency, and cumulative frequency of each class. Identify any 

patterns. 


> Solution 


The midpoints, relative frequencies, and cumulative frequencies of the first 
three classes are calculated as follows. 


Relative Cumulative 
Class Jf Midpoint frequency frequency 
59 + 114 5 
59-114 5 a ain: 86.5 30 ~ 0.17 5 
us170 8 2ST ays Sworn s+e=13 
2 30 
171-226 6 ls = 198.5 < =(2 iBa@=49 


The remaining midpoints, relative frequencies, and cumulative frequencies are 
shown in the following expanded frequency distribution. 


Frequency Distribution for Prices (in dollars) of GPS Navigators 


Prices 
Portion 
of GPS of GPS 


navigators 


59-114 5 86.5 0.17 5 navigators 
115-170 8 142.5 0.27 13 
171-226 6 198.5 0.2 19 
227-282 5 254.5 0.17 24 
283-338 2 310.5 0.07 26 
339-394 1 366.5 0.03 27 
395-450 3 422.5 0.1 30 
Xf = 30 mee x1 


Interpretation There are several patterns in the data set. For instance, the 
most common price range for GPS navigators was $115 to $170. 


> Try It Yourself 2 


Using the frequency distribution constructed in Try It Yourself 1, find the 

midpoint, relative frequency, and cumulative frequency of each class. Identify 

any patterns. 

a. Use the formulas to find each midpoint, relative frequency, and cumulative 
frequency. 

b. Organize your results in a frequency distribution. 

c. Identify patterns that emerge from the data. Answer: Page A31 
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59-114 | 58.5-114.5 
115-170 | 114.5-170.5 
171-226 | 170.5-226.5 
227-282 | 226.5-282.5 
283-338  282.5-338.5 
339-394 | 338.5-394.5 
395-450  394.5-450.5 

INSIGHT 


It is customary in 
bar graphs to have 
spaces between the 
bars, whereas with 
histograms, it is 
customary that the 
bars have no spaces 
between them. 


>) 
8 
6 
> 
2 
1 
i] 
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>» GRAPHS OF FREQUENCY DISTRIBUTIONS 


Sometimes it is easier to identify patterns of a data set by looking at a graph of 
the frequency distribution. One such graph is a frequency histogram. 


DEFINITION 


A frequency histogram is a bar graph that represents the frequency 
distribution of a data set. A histogram has the following properties. 

1. The horizontal scale is quantitative and measures the data values. 

2. The vertical scale measures the frequencies of the classes. 

3. Consecutive bars must touch. 


Because consecutive bars of a histogram must touch, bars must begin and 
end at class boundaries instead of class limits. Class boundaries are the numbers 
that separate classes without forming gaps between them. If data entries are 
integers, subtract 0.5 from each lower limit to find the lower class boundaries. To 
find the upper class boundaries, add 0.5 to each upper limit. The upper boundary 
of a class will equal the lower boundary of the next higher class. 


EXAMPLE 3 G Report 2 


> Constructing a Frequency Histogram 


Draw a frequency histogram for the frequency distribution in Example 2. 
Describe any patterns. 


> Solution 


First, find the class boundaries. Because the data entries are integers, 
subtract 0.5 from each lower limit to find the lower class boundaries and add 
0.5 to each upper limit to find the upper class boundaries. So, the lower and 
upper boundaries of the first class are as follows. 


First class lower boundary = 59 — 0.5 = 58.5 

First class upper boundary = 114 + 0.5 = 114.5 
The boundaries of the remaining classes are shown in the table. To construct 
the histogram, choose possible frequency values for the vertical scale. You can 


mark the horizontal scale either at the midpoints or at the class boundaries. 
Both histograms are shown. 


Prices of GPS Navigators Prices of GPS Navigators 
(labeled with class midpoints) (labeled with class boundaries) 
A A 


10+ 10+ 


Frequency (number of 
GPS navigators) 
Frequency (number of 
GPS navigators) 


O62 22 yO 6? 1 a0 DD 1? 2 A? 2? 99 9? 
HBV?” BH? WO? LO” NGS a aS 


Broken axis Price (in dollars) Price (in dollars) 


Interpretation From either histogram, you can see that more than half of the 
GPS navigators are priced below $226.50. 
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> Try It Yourself 3 


Use the frequency distribution from Try It Yourself 2 to construct a frequency 
histogram that represents the ages of the 50 richest people. Describe any 
patterns. 


a. Find the class boundaries. 

b. Choose appropriate horizontal and vertical scales. 

c. Use the frequency distribution to find the height of each bar. 

d. Describe any patterns in the data. Answer: Page A31 


Another way to graph a frequency distribution is to use a frequency polygon. 
A frequency polygon is a line graph that emphasizes the continuous change in 
frequencies. 


EXAMPLE 4 


STUDY TIP 


A histogram and its corresponding 
frequency polygon are often drawn 
together. If you have not already 
constructed the histogram, begin 
constructing the frequency 
polygon by choosing 

appropriate horizontal 

and vertical scales. The 

horizontal scale should 

consist of the class 

midpoints, and the 

vertical scale should 

consist of appropriate 

frequency values. 


> Constructing a Frequency Polygon 


Draw a frequency polygon for the frequency distribution in Example 2. 
Describe any patterns. 


> Solution 


To construct the frequency polygon, use the same horizontal and vertical scales 
that were used in the histogram labeled with class midpoints in Example 3. 
Then plot points that represent the midpoint and frequency of each class and 
connect the points in order from left to right. Because the graph should begin 
and end on the horizontal axis, extend the left side to one class width before 
the first class midpoint and extend the right side to one class width after the 
last class midpoint. 


Prices of GPS Navigators 


Frequency (number of 
GPS navigators) 


t t t t t t t > 
30.5 86.5 142.5 198.5 254.5 310.5 366.5 422.5 478.5 


Price (in dollars) 


Interpretation You can see that the frequency of GPS navigators increases 
up to $142.50 and then decreases. 


> Try It Yourself 4 


Use the frequency distribution from Try It Yourself 2 to construct a 
frequency polygon that represents the ages of the 50 richest people. Describe 
any patterns. 


a. Choose appropriate horizontal and vertical scales. 
b. Plot points that represent the midpoint and frequency of each class. 
c. Connect the points and extend the sides as necessary. 


d. Describe any patterns in the data. Answer: Page A31 
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A relative frequency histogram has the same shape and the same horizontal 
scale as the corresponding frequency histogram. The difference is that the 
vertical scale measures the relative frequencies, not frequencies. 


igh PICTURING THI EXAMPLE 5 [MSC Beene 

Gidea itnadliaigeyeenat > Constructing a Relative Frequency Histogram 

Yellowstone National Park, Draw a relative frequency histogram for the frequency distribution in 

erupts on a regular basis. Example 2. 

The time spans of a sample : 

of eruptions are given in the > Solution 

relative frequency histogram. The relative frequency histogram is shown. Notice that the shape of the 

(Source: Yellowstone National Park) histogram is the same as the shape of the frequency histogram constructed in 
Old Faithful Eruptions Example 3. The only difference is that the vertical scale measures the relative 

frequencies. 


Prices of GPS Navigators 


sy 
=| 
3) 
=] 
sy) 
o 
p= 
i) 
za] 
3s 
B) 
a4 


2.0 2.6 3.2 3.8 44 
Duration of eruption 
(in minutes) 


Relative frequency 
(portion of GPS navigators) 


Fifty percent of the eruptions 
last less than how many 
minutes? 


58.5 114.5 170.5 226.5 282.5 338.5 394.5 450.5 


Price (in dollars) 


Interpretation From this graph, you can quickly see that 0.27 or 27% of 
the GPS navigators are priced between $114.50 and $170.50, which is not as 
immediately obvious from the frequency histogram. 


> Try It Yourself 5 


Use the frequency distribution in Try It Yourself 2 to construct a relative 
frequency histogram that represents the ages of the 50 richest people. 


a. Use the same horizontal scale that was used in the frequency histogram in 
the Chapter Opener. 
b. Revise the vertical scale to reflect relative frequencies. 
c. Use the relative frequencies to find the height of each bar. 
Answer: Page A31 


If you want to describe the number of data entries that are equal to or below 
a certain value, you can easily do so by constructing a cumulative frequency graph. 


DEFINITION 


A cumulative frequency graph, or ogive (pronounced 0’ jive), is a line graph 
that displays the cumulative frequency of each class at its upper class 
boundary. The upper boundaries are marked on the horizontal axis, and the 
cumulative frequencies are marked on the vertical axis. 


Presented by: https://jafrilibrary.org 


Orn aA DH ON 


Presented by: https://jafrilibrary.org 


SECTION 2.1 FREQUENCY DISTRIBUTIONS AND THEIR GRAPHS 45 


GUIDELINES 


Constructing an Ogive (Cumulative Frequency Graph) 
1. Construct a frequency distribution that includes cumulative frequencies 
as one of the columns. 


2. Specify the horizontal and vertical scales. The horizontal scale consists 
of upper class boundaries, and the vertical scale measures cumulative 
frequencies. 


3. Plot points that represent the upper class boundaries and their 
corresponding cumulative frequencies. 

4. Connect the points in order from left to right. 

5. The graph should start at the lower boundary of the first class 
(cumulative frequency is zero) and should end at the upper boundary 
of the last class (cumulative frequency is equal to the sample size). 


EXAMPLE 6 


> Constructing an Ogive 


Draw an ogive for the frequency distribution in Example 2. Estimate how 
many GPS navigators cost $300 or less. Also, use the graph to estimate when 
the greatest increase in price occurs. 


> Solution 


Using the cumulative frequencies, you can construct the ogive shown. The upper 
class boundaries, frequencies, and cumulative frequencies are shown in the 
table. Notice that the graph starts at 58.5, where the cumulative frequency is 0, 
and the graph ends at 450.5, where the cumulative frequency is 30. 


Prices of GPS Navigators 


Cumulative frequency 
(number of GPS navigators) 


aa 


i if i if i i i 
i I T T T T T 
58.5 1145 170.5 226.5 282.5 338.5 394.5 450.5 


Price (in dollars) 


Interpretation From the ogive, you can see that about 25 GPS navigators 
cost $300 or less. It is evident that the greatest increase occurs between 
$114.50 and $170.50, because the line segment is steepest between these two 
class boundaries. 


Another type of ogive uses percent as the vertical axis instead of frequency 


(see Example 5 in Section 2.5). 
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STUDY TIP 


Detailed instructions for 

using MINITAB, Excel, and the 
TI-83/84 Plus are shown in the 
Technology Guide that accom- 
panies this text. For instance, 
here are instructions for creating 
a histogram on a TI-83/84 Plus. 


STAT] | ENTER 


Enter midpoints in L1. 
Enter frequencies in L2. 


[2nd] [STATPLOT 


Turn on Plot 1. 
Highlight Histogram. 


Xlist: L1 
Freq: L2 


ZOOM | |9 
WINDOW 
Xscl=56 

GRAPH 


> Try It Yourself 6 


Use the frequency distribution from Try It Yourself 2 to construct an ogive that 
represents the ages of the 50 richest people. Estimate the number of people 
who are 80 years old or younger. 


a. Specify the horizontal and vertical scales. 

b. Plot the points given by the upper class boundaries and the cumulative 
frequencies. 

c. Construct the graph. 

d. Estimate the number of people who are 80 years old or younger. 

e. Interpret the results in the context of the data. Answer: Page A31 


EXAMPLE 7 


» Using Technology to Construct Histograms 


Use a calculator or a computer to construct a histogram for the frequency 
distribution in Example 2. 


> Solution 


MINITAB, Excel, and the TI-83/84 Plus each have features for graphing 
histograms. Try using this technology to draw the histograms as shown. 


MINITAB 


10 


8-4 


Frequency 
Frequency 
OsrnuRUTDVNOO 


86.5 142.5 198.5 254.5 310.5 368.5 422.5 86.5 142.5 198.5 254.5 310.5 366.5 422.5 
Price (in dollars) Price (in dollars) 


TI-83/84 PLUS 


> Try It Yourself 7 


Use a calculator or a computer and the frequency distribution from Try It 
Yourself 2 to construct a frequency histogram that represents the ages of the 
50 richest people. 


a. Enter the data 
b. Construct the histogram. Answer: Page A31 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What are some benefits of representing data sets using frequency distribu- 
tions? What are some benefits of using graphs of frequency distributions? 


2. Why should the number of classes in a frequency distribution be between 
5 and 20? 


3. What is the difference between class limits and class boundaries? 
4. What is the difference between relative frequency and cumulative frequency? 


5. After constructing an expanded frequency distribution, what should the sum 
of the relative frequencies be? Explain. 


6. What is the difference between a frequency polygon and an ogive? 


True or False? Jn Exercises 7-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. Ina frequency distribution, the class width is the distance between the lower 
and upper limits of a class. 


8. The midpoint of a class is the sum of its lower and upper limits divided by two. 
9, An ogive is a graph that displays relative frequencies. 


10. Class boundaries are used to ensure that consecutive bars of a histogram touch. 


In Exercises 11-14, use the given minimum and maximum data entries and the 
number of classes to find the class width, the lower class limits, and the upper 
class limits. 


11. min = 9, max = 64,7 classes 12. min = 12, max = 88, 6 classes 
13. min = 17, max = 135, 8 classes 14. min = 54, max = 247, 10 classes 
Reading a Frequency Distribution Jn Exercises 15 and 16, use the given 


frequency distribution to find the (a) class width, (b) class midpoints, and 
(c) class boundaries. 


15. Cleveland, OH 16. ‘Travel Time to Work 
High Temperatures (°F) (in minutes) 
| Clase Frequency f Ss 

20-30 19 0-9 188 
31-41 43 10-19 372 
42-52 68 20-29 264 
53-63 69 30-39 205 
64-74 74 40-49 83 
75-85 68 50-59 76 
86-96 24 60-69 32 


17. Use the frequency distribution in Exercise 15 to construct an expanded 
frequency distribution, as shown in Example 2. 


18. Use the frequency distribution in Exercise 16 to construct an expanded 
frequency distribution, as shown in Example 2. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 
48 CHAPTER 2 DESCRIPTIVE STATISTICS 


Graphical Analysis Jn Exercises 19 and 20, use the frequency histogram to 
(a) determine the number of classes. 
(b) estimate the frequency of the class with the least frequency. 
(c) estimate the frequency of the class with the greatest frequency. 


(d) determine the class width. 


19. Employee Salaries 20. Tree Heights 


Frequency 
Frequency 


18 23 28 33 38 43 48 
Height (in inches) 


“Hn ny yn 
SM SF oS sf s = os 
a an astr nN OO - 0 


Salary (in thousands of dollars) 


Graphical Analysis In Exercises 21 and 22, use the ogive to approximate 
(a) the number in the sample. 


(b) the location of the greatest increase in frequency. 


21. Male Beagles 22. Adult Females, Ages 20-29 
A 
55 -- 55+ 
eB 50-- = 50-- 
a 45+ a ast 
3) 40-- 5 40+ 
3 35+ 3B 35+ 
=) I 
= 30-+ 5 30+ 
g 27 B+ 
3 07 = 20+ 
15+ B s+ 
5 10=- EI 10+ 
O 5+ GO 5+ 
Ht+++++4++++++4+- aa a gi SS 
18.5 21.5 24.5 27.5 30.5 33.5 58 60 62 64 66 68 70 72 74 
Weight (in pounds) Height (in inches) 


23. Use the ogive in Exercise 21 to approximate 


(a) the cumulative frequency for a weight of 27.5 pounds. 
(b) the weight for which the cumulative frequency is 45. 


(c) the number of beagles that weigh between 22.5 pounds 
and 29.5 pounds. 


(d) the number of beagles that weigh more than 30.5 pounds. 


24. Use the ogive in Exercise 22 to approximate 


(a) the cumulative frequency for a height of 72 inches. 
(b) the height for which the cumulative frequency is 25. 


(c) the number of adult females that are between 62 and 
66 inches tall. 


(d) the number of adult females that are taller than 70 inches. 
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Graphical Analysis In Exercises 25 and 26, use the relative frequency 
histogram to 


(a) identify the class with the greatest, and the class with the least, relative frequency. 
(b) approximate the greatest and least relative frequencies. 


(c) approximate the relative frequency of the second class. 


25. Atlantic Croaker Fish 20. Emergency Response Times 
A A 
cal may 
oO i?) 
[=| =| 
o o 
=) =] 
ion lo 
o o 
& & 
o o 
= Z 
2 2 
I I 
x 4 
55 7.5 95 115 13.5 15.5 175 17.5 18.5 19.5 20.5 21.5 
Length (in inches) Time (in minutes) 


Graphical Analysis Jn Exercises 27 and 28, use the frequency polygon to 
identify the class with the greatest, and the class with the least, frequency. 


27 Raw MCAT Scores 28. Shoe Sizes for 50 Females 
for 60 Applicants t 


Frequency 
Frequency 


fatale tctetaatet ttatehted 
10 13 16 19 22 25 28 31 34 37 40 43 
Score 


M@ USING AND INTERPRETING CONCEPTS 


Constructing a Frequency Distribution In Exercises 29 and 30, construct 
a frequency distribution for the data set using the indicated number of classes. In 
the table, include the midpoints, relative frequencies, and cumulative frequencies. 
Which class has the greatest frequency and which has the least frequency? 


‘. 29. Political Blog Reading Times 
Number of classes: 5 
Data set: Time (in minutes) spent reading a political blog in a day 


7 39 13 9 25 8 22 0 2 18 2 30 7 
35 12 15 8 6 5 29 0 11 39 16 15 


‘. 30. Book Spending 
Number of classes: 6 
Data set: Amount (in dollars) spent on books for a semester 


91 472 279 249 530 376 188 341 266 199 
142 273 189 130 489 266 248 101 375 486 
190 398 188 269 43 30 127 354 84 


“. indicates that the data set for this exercise is available electronically. 
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Constructing a Frequency Distribution and a Frequency Histogram 
In Exercises 31-34, construct a frequency distribution and a frequency histogram 
for the data set using the indicated number of classes. Describe any patterns. 


" 31. Sales 
Number of classes: 6 
Data set: July sales (in dollars) for all sales representatives at 
a company 


2114 2468 7119 1876 4105 3183 1932 1355 
4278 1030 2000 1077 5835 1512 1697 2478 
3981 1643 1858 1500 4608 1000 


" 32. Pepper Pungencies 
Number of classes: 5 
Data set: Pungencies (in 1000s of Scoville units) of 24 tabasco 
peppers 


35 51 44 42 37 38 36 39 
44 43 40 40 32 39 41 38 
42 39 40 46 37 35 41 39 


" 33. Reaction Times 
Number of classes: 8 
Data set: Reaction times (in milliseconds) of a sample of 30 adult 
females to an auditory stimulus 


507 389 305 291 336 310 514 442 
373 428 387 454 323 441 388 426 
411 382 320 450 309 416 359 388 
307 337 469 351 422 413 


", 34. Fracture Times 
Number of classes: 5 
Data set: Amounts of pressure (in pounds per square inch) at fracture 
time for 25 samples of brick mortar 


2750 2862 2885 2490 2512 2456 2554 
2872 2601 2877 2721 2692 2888 2755 
2867 2718 2641 2834 2466 2596 2519 
2532 2885 2853 2517 


Constructing a Frequency Distribution and a Relative Frequency 
Histogram In Exercises 35-38, construct a frequency distribution and a relative 
frequency histogram for the data set using five classes. Which class has the greatest 
relative frequency and which has the least relative frequency? 


s 
4 


35. Gasoline Consumption 
Data set: Highway fuel consumptions (in miles per gallon) for a sample 
of cars 


32 35 28 40 30 42 55 40 45 24 
28 34 40 36 34 40 30 25 28 32 
40 35 25 44 26 39 38 42 45 32 


"© 36. ATM Withdrawals 
Data set: A sample of ATM withdrawals (in dollars) 


35 10 30 25 75 10 30 20 20 10 40 
50 40 30 60 70 25 40 10 60 20 80 
40 25 20 10 20 25 30 50 80 20 
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" 37. Triglyceride Levels 
Data set: Triglyceride levels (in milligrams per deciliter of blood) of a 
sample of patients 


209 140 155 170 265 138 180 295 250 
320 270 225 215 390 420 462 150 200 
400 295 240 200 190 145 160 175 


* 38. Years of Service 
Data set: Years of service of a sample of New York state troopers 


1279 8 9 8 12 10 9 
10 6 8 13 12 10 11 7 14 
12 9 8 10 9 11 13 8 


Constructing a Cumulative Frequency Distribution and an Ogive 
In Exercises 39 and 40, construct a cumulative frequency distribution and an ogive 
for the data set using six classes. Then describe the location of the greatest increase 
in frequency. 


" 39. Retirement Ages 
Data set: Retirement ages for a sample of doctors 


70 54 55 71 57 58 63 65 
60 66 57 62 63 60 63 60 
66 60 67 69 69 52 61 73 


" 40. Saturated Fat Intakes 
Data set: Daily saturated fat intakes (in grams) of a sample of people 


38 32 34 39 40 54 32 17 29 33 
57 40 25 36 33 24 42 16 31 33 


Constructing a Frequency Distribution and a Frequency Polygon 
In Exercises 41 and 42, construct a frequency distribution and a frequency polygon 
for the data set. Describe any patterns. 


*, 41. Exam Scores 
Number of classes: 5 
Data set: Exam scores for all students in a statistics class 


83 92 94 82 73 98 78 85 72 90 
89 92 96 89 75 85 63 47 75 82 


* 42. Children of the Presidents 
Number of classes: 6 
Data set: Number of children of the U.S. presidents 
(Source: presidentschildren.com) 


0560 3 4 0 4 10 15 062 3 0 
4548 73 53 2 63 3 1 2 
26123224 4 46 1 2 2 


In Exercises 43 and 44, use the data set to construct (a) an expanded frequency 
distribution, (b) a frequency histogram, (c) a frequency polygon, (d) a relative 
frequency histogram, and (e) an ogive. 


© 43. Pulse Rates 
Number of classes: 6 
Data set: Pulse rates of students in a class 


68 105 95 80 90 100 75 70 84 98 102 70 
65 88 90 75 78 94 110 120 95 80 76 108 


Presented by: https://jafrilibrary.org 


52 


CHAPTER 2 


Presented by: https://jafrilibrary.org 


DESCRIPTIVE STATISTICS 


s 
4 


44, 


Hospitals 

Number of classes: 8 

Data set: Number of hospitals in each state (Source: American 
Hospital Directory) 


15 100 56 74 360 53 34 8 213 116 


29 121 378 36 91 7 #61 71 40 = 15 


45. Use StatCrunch to construct a frequency histogram and a relative 


s 
4 


46. 


frequency histogram for the following data set that shows the finishing 
times (in minutes) for 25 runners in a marathon. Use seven classes. 


159 164 165 170 215 200 167 225 192 185 235 240 225 
191 194 175 167 234 158 172 180 240 176 159 231 


Writing What happens when the number of classes is increased for a 
frequency histogram? Use the data set listed and a technology tool to 
create frequency histograms with 5, 10, and 20 classes. Which graph 
displays the data best? 


2 7 3 2 11° #3 «15 
7 11 10 1 2 12~=«5 


8 4 9 10 13 9 
642 9 15 


M@ EXTENDING CONCEPTS 


s 
4 


47. 


48. 


What Would You Do? You work at a bank and are asked to 
recommend the amount of cash to put in an ATM each day. You don’t 
want to put in too much (security) or too little (customer irritation). 
Here are the daily withdrawals (in 100s of dollars) for 30 days. 


72 84 61 76 104 76 86 92 80 88 
98 76 97 82 84 67 70 81 82 89 
74 73 86 81 8 78 82 80 91 83 


(a) Construct a relative frequency histogram for the data using 8 classes. 


(b) If you put $9000 in the ATM each day, what percent of the days 
in a month should you expect to run out of cash? Explain your 
reasoning. 


(c) If you are willing to run out of cash for 10% of the days, how much 
cash should you put in the ATM each day? Explain your reasoning. 


What Would You Do? You work in the admissions department for a 
college and are asked to recommend the minimum SAT scores that the 
college will accept for a position as a full-time student. Here are the 
SAT scores for a sample of 50 applicants. 


1760 1502 1375 1310 1601 1942 1380 2211 1622 1771 
1150 1351 1682 1618 2051 1742 1463 1395 1860 1918 
1882 1996 1525 1510 2120 1700 1818 1869 1440 1235 

976 1513 1790 2250 2102 1905 1979 1588 1420 1730 
2175 1930 1965 1658 2005 2125 1260 1560 1635 1620 


(a) Construct a relative frequency histogram for the data using 
10 classes. 


(b) If you set the minimum score at 1616, what percent of the applicants 
will meet this requirement? Explain your reasoning. 


(c) If you want to accept the top 88% of the applicants, what should the 
minimum score be? Explain your reasoning. 
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WHAT YOU SHOULD LEARN 


» How to graph and interpret 
quantitative data sets using 
stem-and-leaf plots and 
dot plots 


» How to graph and interpret 
qualitative data sets using pie 
charts and Pareto charts 


>» How to graph and interpret 
paired data sets using scatter 
plots and time series charts 


STUDY TIP 


It is important to include a key 
for a stem-and-leaf plot to 
identify the values of 
the data. This is done 
by showing a value 
represented by a 
stem and one leaf. 
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More Graphs and Displays 


Graphing Quantitative Data Sets » Graphing Qualitative Data Sets > 
Graphing Paired Data Sets 


>» GRAPHING QUANTITATIVE DATA SETS 


In Section 2.1, you learned several traditional ways to display quantitative data 
graphically. In this section, you will learn a newer way to display quantitative 
data, called a stem-and-leaf plot. Stem-and-leaf plots are examples of exploratory 
data analysis (EDA), which was developed by John Tukey in 1977. 

In a stem-and-leaf plot, each number is separated into a stem (for instance, 
the entry’s leftmost digits) and a leaf (for instance, the rightmost digit). You 
should have as many leaves as there are entries in the original data set and the 
leaves should be single digits. A stem-and-leaf plot is similar to a histogram but 
has the advantage that the graph still contains the original data values. Another 
advantage of a stem-and-leaf plot is that it provides an easy way to sort data. 


EXAMPLE 1 G® Report 4 


>» Constructing a Stem-and-Leaf Plot 


The following are the numbers of text messages sent last week by the cellular 
phone users on one floor of a college dormitory. Display the data in a 
stem-and-leaf plot. What can you conclude? 


155 159 144 129 105 145 126 116 130 114 122 112 112 142 
126 118 118 108 122 121 109 140 126 119 113 117 118 109 
109 119 139 139 122 78 133 126 123 145 121 134 124 119 
132 133 124 129 112 126 148 147 


> Solution Because the data entries go from a low of 78 to a high of 159, 
you should use stem values from 7 to 15. To construct the plot, list these 
stems to the left of a vertical line. For each data entry, list a leaf to the right 
of its stem. For instance, the entry 155 has a stem of 15 and a leaf of 5. 
The resulting stem-and-leaf plot will be unordered. To obtain an ordered 
stem-and-leaf plot, rewrite the plot with the leaves in increasing order from 
left to right. Be sure to include a key. 


Number of Text Messages Sent Number of Text Messages Sent 
7|8 Key: 15|5 = 155 7|8 Key: 15|5 = 155 
8 8 
9 9 

10 | 58999 10 | 58999 

11 | 6422889378992 11 | 2223467888999 

12 | 962621626314496 12 | 112223446666699 

13 | 0993423 13 | 0233499 

14 | 4520587 14 | 0245578 

15 | 59 15 1 59 

Unordered Stem-and-Leaf Plot Ordered Stem-and-Leaf Plot 


Interpretation From the display, you can conclude that more than 50% of the 
cellular phone users sent between 110 and 130 text messages. 
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INSIGHT 


You can use stem-and-leaf 
plots to identify unusual 
data values called 
outliers. In Examples 1 
and 2, the data value 
78 is an outlier. You will 
learn more about outliers 
in Section 2.3. 
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> Try It Yourself 1 


Use a stem-and-leaf plot to organize the ages of the 50 richest people data set 
listed in the Chapter Opener on page 37. What can you conclude? 


a. List all possible stems. 

b. List the /eaf of each data entry to the right of its stem and include a key. 

c. Rewrite the stem-and-leaf plot so that the leaves are ordered. 

d. Use the plot to make a conclusion. Answer: Page A31 


EXAMPLE 2 


> Constructing Variations of Stem-and-Leaf Plots 


Organize the data given in Example 1 using a stem-and-leaf plot that has two 
rows for each stem. What can you conclude? 


> Solution 


Use the stem-and-leaf plot from Example 1, except now list each stem twice. 
Use the leaves 0,1, 2,3, and 4 in the first stem row and the leaves 5, 6, 7,8, and 
9 in the second stem row. The revised stem-and-leaf plot is shown. Notice that 
by using two rows per stem, you obtain a more detailed picture of the data. 


Number of Text Messages Sent Number of Text Messages Sent 


7 Key: 15|5 = 155 7 Key: 15|5 = 155 
7| 8 7 |:8 

8 8 

8 8 

9 9 

9 9 

10 10 

10 | 58999 10 | 58999 

11 | 42232 11 | 22234 

11 | 68897899 11 | 67888999 
12 | 22123144 12 | 11222344 
12 | 9666696 12 | 6666699 
13 | 03423 13 | 02334 

13 | 99 13 | 99 

14 | 420 14 | 024 

14 | 5587 14.) 5578 

15 15 

15 1 59 15 159 


Unordered Stem-and-Leaf Plot Ordered Stem-and-Leaf Plot 


Interpretation From the display, you can conclude that most of the cellular 
phone users sent between 105 and 135 text messages. 


> Try It Yourself 2 


Using two rows for each stem, revise the stem-and-leaf plot you constructed in 
Try It Yourself 1. What can you conclude? 


a. List each stem twice. 
b. List all leaves using the appropriate stem row. 


c. Use the plot to make a conclusion. Answer: Page A32 
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You can also use a dot plot to graph quantitative data. In a dot plot, each data 
entry is plotted, using a point, above a horizontal axis. Like a stem-and-leaf plot, 
a dot plot allows you to see how data are distributed, determine specific data 
entries, and identify unusual data values. 


EXAMPLE 3 G9 Report 5 


> Constructing a Dot Plot 


Use a dot plot to organize the text messaging data given in Example 1. What 
can you conclude from the graph? 


155 159 144 129 105 145 126 116 130 114 122 112 
112 142 126 118 118 108 122 121 109 140 126 119 
113. 117: «118 «+109 109 119 139 139 122 78 133 126 
123 145 121 134 124 119 132 133 124 129 112 126 
148 147 


> Solution 

So that each data entry is included in the dot plot, the horizontal axis should 
include numbers between 70 and 160. To represent a data entry, plot a point 
above the entry’s position on the axis. If an entry is repeated, plot another 
point above the previous point. 


Number of Text Messages Sent 


e 
ee eo e e 
ee eooeee e@ 0 e e 
e © @0 e080 cece ec0e © ce cco ee © ee ce e 
SEE EE EEE EEE EEE EEE EEE EEE EEE EEE EEE EEE 
70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 


Interpretation From the dot plot, you can see that most values cluster 
between 105 and 148 and the value that occurs the most is 126. You can also 
see that 78 is an unusual data value. 


> Try It Yourself 3 


Use a dot plot to organize the ages of the 50 richest people data set listed in 
the Chapter Opener on page 37. What can you conclude from the graph? 


a. Choose an appropriate scale for the horizontal axis. 
b. Represent each data entry by plotting a point. 
c. Describe any patterns in the data. Answer: Page A32 


Technology can be used to construct stem-and-leaf plots and dot plots. 
For instance, a MINITAB dot plot for the text messaging data is shown below. 


Number of Text Messages Sent 


° ees See cose sees @ se ose Se 0 os co oe 
T T T T T T T T T 
80 90 100 110 120 130 140 150 160 
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Earned Degrees Conferred in 2007 


Associate’s 728 

Bachelor’s 1525 

Master’s 604 

First professional 90 

Doctoral 60 
STUDY TIP 


Here are instructions for 
constructing a pie chart using 
Excel. First, enter the degree 
types and their corresponding 
frequencies or relative frequencies 
in two separate columns. Then 
highlight the two columns, click 
on the Chart Wizard, and select 
Pie as your chart type. Click Next 
throughout the Chart Wizard 
while constructing your pie chart. 


Earned Degrees Conferred 
in 2007 


First professional 
3% 


Bachelor's 

51% 

Master's 
20% 


Associate's Doctoral 
24% 2% 


>» GRAPHING QUALITATIVE DATA SETS 


Pie charts provide a convenient way to present qualitative data graphically as 
percents of a whole. A pie chart is a circle that is divided into sectors that 
represent categories. The area of each sector is proportional to the frequency of 
each category. In most cases, you will be interpreting a pie chart or constructing 
one using technology. Example 4 shows how to construct a pie chart by hand. 


EXAMPLE 4 GG Report 6 


> Constructing a Pie Chart 


The numbers of earned degrees conferred (in thousands) in 2007 are shown 
in the table. Use a pie chart to organize the data. What can you conclude? 
(Source: U.S. National Center for Education Statistics) 


> Solution 


Begin by finding the relative frequency, or percent, of each category. Then 
construct the pie chart using the central angle that corresponds to each 
category. To find the central angle, multiply 360° by the category’s relative 
frequency. For instance, the central angle for associate’s degrees is 
360°(0.24) ~ 86°. To construct a pie chart in Excel, follow the instructions in 
the margin. 


Earned Degrees Conferred in 2007 
Doctoral 2% 


Associate’s 728 0.24 First Associate’s 
Bachelor’s | er [rage |e XK 24% 
Master’s 604 0.20 72° Master’s \ 

First professional 90 0.03 ii oe Bachelor’s 
Doctoral 60 0.02 7 51% 


Interpretation From the pie chart, you can see that over one half of the 
degrees conferred in 2007 were bachelor’s degrees. 
> Try It Yourself 4 


The numbers of earned degrees conferred (in thousands) in 1990 are shown in 
the table. Use a pie chart to organize the data. Compare the 1990 data with the 
2007 data. (Source: U.S. National Center for Education Statistics) 


Earned Degrees Conferred in 1990 


Associate’s 455 
Bachelor’s 1052 
Master’s 325 
First professional 71 
Doctoral 38 


a. Find the relative frequency and central angle of each category. 
b. Use the central angle to find the portion that corresponds to each category. 
ce. Compare the 1990 data with the 2007 data. Answer: Page A32 


Presented by: https://jafrilibrary.org 


The six top-selling video games 
of 2009 through November are 
shown in the following Pareto 
chart. Publishers are in 
parentheses. (Source: NPD Group) 


Six Top-Selling Video Games 
of 2009 Through November 
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Of the six top-selling video 
games, how many units did 
Nintendo sell? 
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Another way to graph qualitative data is to use a Pareto chart. A Pareto 
chart is a vertical bar graph in which the height of each bar represents frequency 
or relative frequency. The bars are positioned in order of decreasing height, with 
the tallest bar positioned at the left. Such positioning helps highlight important 
data and is used frequently in business. 


EXAMPLE 5 G® Report 7 


» Constructing a Pareto Chart 


In a recent year, the retail industry lost $36.5 billion in inventory shrinkage. 
Inventory shrinkage is the loss of inventory through breakage, pilferage, 
shoplifting, and so on. The main causes of inventory shrinkage are administrative 
error ($5.4 billion), employee theft ($15.9 billion), shoplifting ($12.7 billion), 
and vendor fraud ($1.4 billion). If you were a retailer, which causes of inventory 
shrinkage would you address first? (Source: National Retail Federation and the 
University of Florida) 


> Solution 
Using frequencies for the vertical axis, you can construct the Pareto chart as 
shown. 
Main Causes of Inventory Shrinkage 
A 
16+ 
e 14> 
= 12+ 
s 
= Oy 
n 8+ 
S 
2 64- 
zat 
eu 


Employee Shoplifting Administrative | Vendor 
theft error fraud 


Cause 


Interpretation From the graph, it is easy to see that the causes of inventory 
shrinkage that should be addressed first are employee theft and shoplifting. 


> Try It Yourself 5 


Every year, the Better Business Bureau (BBB) receives complaints from 
customers. In a recent year, the BBB received the following complaints. 


7792 complaints about home furnishing stores 

5733 complaints about computer sales and service stores 
14,668 complaints about auto dealers 

9728 complaints about auto repair shops 

4649 complaints about dry cleaning companies 


Use a Pareto chart to organize the data. What source is the greatest cause of 
complaints? (Source: Council of Better Business Bureaus) 


a. Find the frequency or relative frequency for each data entry. 

b. Position the bars in decreasing order according to frequency or relative 
frequency. 

c. Interpret the results in the context of the data. Answer: Page A32 
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32,000 
32,500 
40,000 
27,350 
25,000 
43,000 
41,650 
39,225 
45,100 
28,000 


>» GRAPHING PAIRED DATA SETS 


When each entry in one data set corresponds to one entry in a second data set, 
the sets are called paired data sets. For instance, suppose a data set contains the 
costs of an item and a second data set contains sales amounts for the item at each 
cost. Because each cost corresponds to a sales amount, the data sets are paired. 
One way to graph paired data sets is to use a scatter plot, where the ordered pairs 
are graphed as points in a coordinate plane. A scatter plot is used to show the 
relationship between two quantitative variables. 


EXAMPLE 6 


> Interpreting a Scatter Plot 


The British statistician Ronald Fisher (see page 33) introduced a famous data 
set called Fisher’s Iris data set. This data set describes various physical 
characteristics, such as petal length and petal width (in millimeters), for three 
species of iris. In the scatter plot shown, the petal lengths form the first data set 
and the petal widths form the second data set. As the petal length increases, 
what tends to happen to the petal width? (Source: Fisher, R. A., 1936) 


Fisher’s Iris Data Set 


25 + © ee 
e oc eee e 
o ee e 
B oot ecco © ° 
a coe 
= ee eco e 
Le | e e 
tS eo e@ e 
F 15+ © @00 cee 
g eo 0 ee e 
co @ eeccee eco 
s : eooe eo 
3 10> eoo oe e 
= 
3 e 
o 5-> e 
a © ecco 
eco e 
@ eneene © 
<"_. —- 
10 20 30 40 50 60 70 


Petal length (in millimeters) 


> Solution 


The horizontal axis represents the petal length, and the vertical axis represents 
the petal width. Each point in the scatter plot represents the petal length and 
petal width of one flower. 


Interpretation From the scatter plot, you can see that as the petal length 
increases, the petal width also tends to increase. 
> Try It Yourself 6 


The lengths of employment and the salaries of 10 employees are listed in 
the table at the left. Graph the data using a scatter plot. What can you 
conclude? 


a. Label the horizontal and vertical axes. 
b. Plot the paired data. 
c. Describe any trends. Answer: Page A32 


You will learn more about scatter plots and how to analyze them in Chapter 9. 
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A data set that is composed of quantitative entries taken at regular intervals 
over a period of time is called a time series. For instance, the amount of 


precipitation measured each day for one month is a time series. You can use a 
time series chart to graph a time series. 


EXAMPLE 7 G Report 8 
> Constructing a Time Series Chart 


See MINITAB and TI-83/84 Plus The table lists the number of cellular 
steps on pages 122 and 123. telephone subscribers (in millions) and 
subscribers’ average local monthly bills 


for service (in dollars) for the years 1998 69.2 39.43 


1998 through 2008. Construct a time 1999 86.0 41.24 

series chart for the number of cellular 2000 109.5 45.27 

subscribers. What can you conclude? 2001 128.4 47.37 

(Source: Cellular Telecommunications & 

se 2002 140.8 48.40 

Internet Association) 
2003 158.7 49.91 
2004 182.1 50.64 
2005 207.9 49.98 
2006 233.0 50.56 
2007 255.4 49.79 
2008 270.3 50.07 

» Solution 


Let the horizontal axis represent the years and let the vertical axis represent 
the number of subscribers (in millions). Then plot the paired data and connect 
them with line segments. 


Cellular Telephone Subscribers 


Subscribers (in millions) 


t t t t t t t t t t {-— 
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
Year 


Interpretation The graph shows that the number of subscribers has been 
increasing since 1998. 


> Try It Yourself 7 


Use the table in Example 7 to construct a time series chart for subscribers’ 
average local monthly cellular telephone bills for the years 1998 through 2008. 
What can you conclude? 


a. Label the horizontal and vertical axes. 
b. Plot the paired data and connect them with line segments. 
c. Describe any patterns you see. Answer: Page A32 
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WD) EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Name some ways to display quantitative data graphically. Name some ways 
to display qualitative data graphically. 


FOR EXTRA HELP: 2. What is an advantage of using a stem-and-leaf plot instead of a histogram? 
3 A What is a disadvantage? 


3. In terms of displaying data, how is a stem-and-leaf plot similar to a dot plot? 


4. How is a Pareto chart different from a standard vertical bar graph? 


Putting Graphs in Context Jn Exercises 5-8, match the plot with the 
description of the sample. 


5.0/8 Key: 0|8 = 0.8 6. 6|78 Key: 6|7 = 67 
1| 568 7|455888 
2/1345 8] 1355889 
3 | 09 9|00024 
4|00 

7. e 8 e 
SI ELELUAAAANAABANALEEUALIAAAAARAARA a a weet 
ee eS a HHH HI 

200 205 210 215 22 


(a) Time (in minutes) it takes a sample of employees to drive to work 
(b) Grade point averages of a sample of students with finance majors 


(c) Top speeds (in miles per hour) of a sample of high-performance sports 
cars 


(d) Ages (in years) of a sample of residents of a retirement home 


Graphical Analysis In Exercises 9-12, use the stem-and-leaf plot or dot plot to 
list the actual data entries. What is the maximum data entry? What is the minimum 
data entry? 


9. Key: 2|7 = 27 10. Key: 12|9 = 12.9 
2/7 12 
a 12 12 | 9 
4| 1334778 13/3 
5|0112333444456689 13 | 677 
6 | 888 14/1111344 
7/388 14 | 699 
8 | 5 15 | 000124 
15 | 678889 
16 | 1 
16 | 67 
11. : 12. 7 
=< t t t t t ae =H -H--+--H+- H+ + + 4 41 — 
13. 14 #15 16 «#«17)« «18 ~=«#19 215 220 225 230 235 
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M@ USING AND INTERPRETING CONCEPTS 


Graphical Analysis In Exercises 13-16, give three conclusions that can be 
drawn from the graph. 


13. Average Time Spent on Top 5 14. Motor Vehicle Thefts in U.S. 
Social Networking Sites A 
4 > 1Ss+ 
o 2 20+ =| oo 
2p 2 & 10+ 
lag . 
5 05+ 
= 
» + + + > 
2003 2004 2005 2006 2007 2008 
Site Year 
(Source: Experian Hitwise) (Source: Federal Bureau of Investigation) 
15. How Other Drivers Irk Us 16. Driving and Cell Phone Use 
Too cautious 2% Ignoring signals a 50 i 
: ‘=| 
Speeding 3% 3 Pale 
71% Using cell 3 
Driving slow phone 21% 5 at 
13% Using two 2 sl 
No signals parking spots So T 
13% 4% > 
: : Tailoati Swerved Sped Cut off Almost 
Other 10% Bright lights ailgating up acar hitacar 
4% 23% Incident 
(Adapted from Reuters/Zogby) (Adapted from USA Today) 


Graphing Data Sets Jn Exercises 17-30, organize the data using the 
indicated type of graph. What can you conclude about the data? 


* 17. Exam Scores Use a stem-and-leaf plot to display the data. The data 
represent the scores of a biology class on a midterm exam. 


75 85 90 80 87 67 82 88 95 91 73 80 
83 92 94 68 75 91 79 95 87 76 91 85 


* 18. Highest Paid CEOs Use a stem-and-leaf plot that has two rows for 
each stem to display the data. The data represent the ages of the top 
30 highest paid CEOs. (Source: Forbes) 


64 74 55 55 62 63 SO 67 51 59 50 
52 50 59 62 64 57 61 49 63 62 60 
55 56 48 58 64 60 60 57 


* 19. Ice Thickness Use a stem-and-leaf plot to display the data. The 
data represent the thicknesses (in centimeters) of ice measured at 
20 different locations on a frozen lake. 


5.8 64 69 7.2 5.1 49 43 5.8 7.0 68 
8.1 7.5 7.2 6.9 5.8 7.22 8.0 7.0 6.9 5.9 


*, 20. Apple Prices Use a stem-and-leaf plot to display the data. The data 
represent the prices (in cents per pound) paid to 28 farmers for apples. 


19.2 19.6 164 17.1 19.0 17.4 17.3 
20.1 19.0 17.5 17.6 18.6 184 17.7 
19.5 184 18.9 17.5 19.3 20.8 19.3 
18.6 186 183 17.1 181 16.8 17.9 
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28 


9.98 
10.79 
11.71 
11.80 
11.51 
13.65 
12.05 
10.54 
10.33 
11.57 
10.17 


12.16 
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s 
4 


r 


- 22. Life Spans of Houseflies Use a dot plot to display the data. The data 


21. Systolic Blood Pressures Use a dot plot to display the data. The data 
represent the systolic blood pressures (in millimeters of mercury) of 
30 patients at a doctor’s office. 


120 135 140 145 130 150 120 170 145 125 
130 110 160 180 200 150 200 135 140 120 
120 130 140 170 120 165 150 130 135 140 


represent the life spans (in days) of 40 houseflies. 


9 9 4 4 8 11 105 8 3B 9 

6 7 11 13 11 6 9 8 14 10 6 
10 10 8 7 14 11 7 8 6 11 13 
10 14 14 8 13 14 10 


23. New York City Marathon Use a pie chart to display the data. The data 


represent the number of men’s New York City Marathon winners from each 
country through 2009. (Source: New York Road Runners) 


United States 15 Mexico 4 
Italy 4 Morocco 1 
Ethiopia i Great Britain 1 
South Africa 2 Brazil 2 
Tanzania 1 New Zealand 1 
Kenya 8 


24. NASA Budget Use a pie chart to display the data. The data represent 


the 2010 NASA budget request (in millions of dollars) divided among five 
categories. (Source: NASA) 


Science, aeronautics, exploration 8947 


Space operations 6176 
Education 126 
Cross-agency support 3401 
Inspector general 36 


25. Barrel of Oil Use a Pareto chart to display the data. The data represent how 


a 42-gallon barrel of crude oil is distributed. (Adapted from American Petroleum 
Institute) 


Gasoline 43% 
Kerosene-type jet fuel 9% 
Distillate fuel oil (home heating, diesel fuel, etc.) 24% 
Coke 5% 
Residual fuel oil (industry, marine transportation, etc.) 4% 
Liquefied refinery gases 3% 
Other 12% 


26. UV Index Use a Pareto chart to display the data. The data represent the 


r 
+ 


TABLE FOR EXERCISE 27 


ultraviolet indices for five cities at noon on a recent date. (Source: National 
Oceanic and Atmospheric Administration) 


Atlanta,GA Boise,ID  Concord,NH Denver,CO Miami, FL 
9 7 8 7 10 


27. Hourly Wages Use a scatter plot to display the data shown in the 
table. The data represent the number of hours worked and the hourly 
wages (in dollars) for a sample of 12 production workers. Describe any 
trends shown. 
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< 


~ 28. Salaries Use a scatter plot to display the data shown in the table. The 
data represent the number of students per teacher and the average 
teacher salaries (in thousands of dollars) for a sample of 10 school 


1741 28.7 districts. Describe any trends shown. 

17.5 47.5 29. Daily High Temperatures Use a time series chart to display the data. The 
18.9 31.8 data represent the daily high temperatures for a city for a period of 12 days. 
17.1 28.1 Mayl May2 May3 £May4 May5 May 6 

20.0 40.3 77° qT 79° 81° 82° 82° 

18.6 33.8 May7 May8 May9 May10 May1l May 12 

14.4 49.8 85° 87° 90° 88° 89° 82° 

16.5 37.5 * 30. Manufacturing Use a time series chart to display the data. The data 
13.3 42.5 represent the percentages of the U.S. gross domestic product (GDP) 
18.4 31.9 that come from the manufacturing sector. (Source: U.S. Bureau of 


Economic Analysis) 


1997 1998 1999 2000 2001 2002 
16.6% 15.4% 148% 145% 13.2% 12.9% 


2003 2004 2005 2006 2007 2008 
12.5% 12.2% 11.9% 12.0% 11.7% 115% 


TABLE FOR EXERCISE 28 


In Exercises 31-34, use StatCrunch to organize the data using the indicated 
type of graph. What can you conclude about the data? 


31. Use a stem-and-leaf plot to display the data. The data represent the scores of 
an economics class on a final exam. 


82 93 95 75 68 90 98 71 85 88 100 93 
70 80 89 62 55 95 83 86 88 76 99 87 


32. Use a dot plot to display the data. The data represent the screen sizes (in 
inches) of 20 DVD camcorders. 


3.0 2.7 3.2 2.7 18 2.7 2.7 3.0 2.7 3.0 
2.55 3.2 2.7 2.7 3.0 2.7 2.0 2.7 3.0 2.5 


33. Use (a) a pie chart and (b) a Pareto chart to display the data. The data 
represent the results of an online survey that asked adults which type of 
investment they would focus on in 2010. (Adapted from CNN) 


US. stocks 11,521 Emerging markets 5267 
Bonds 3292 Commodities 1975 
Bank accounts 10,533 


34. The data represent the number of motor vehicles (in millions) registered 
in the U.S. and the number of crashes (in millions). (Source: U.S. National 
Highway Safety Traffic Administration) 


“Year 2000 | 2001 = 2002 2003 | 2004 | 2005 2006 | 2007 


221 230 230 231 237 241 244 247 


(Game) ca | 6s 63 | 63 | 62 | 62 | 60 | 60 


(a) Use a scatter plot to display the number of registrations. 


(b) Use a scatter plot to display the number of crashes. 
(c) Construct a time series chart for the number of registrations. 
(d) Construct a time series chart for the number of crashes. 
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Law Firm A Law Firm B 
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Key: 5|19|0 = $195,000 for 
Law Firm A and 
$190,000 for Law Firm B 


FIGURE FOR EXERCISE 39 


M@ EXTENDING CONCEPTS 


A Misleading Graph? A misleading graph is a statistical graph that is 
not drawn appropriately. This type of graph can misrepresent data and lead to 
false conclusions. In Exercises 35-38, (a) explain why the graph is misleading, and 
(b) redraw the graph so that it is not misleading. 


35. 


37. 


39. 


40. 


Sales for Company A 36. ; Results of a Survey 
A 


100 —- 60+ 
a = te. 
in. 
3rd 2nd Ist 4th 


Middle High —College/ 


Percent that 
responded “yes” 
n 
BN 
i 
T 


Sales 
(in thousands of dollars) 


Quarter school school university 
Type of student 
Sales for Company B 38. U.S. Crude Oil Imports by 

4th quarter 1st quarter 7 , Country of Origin 2008 

20% 38% g 9000 
z 1500 +5 
& 1000+ 
E 500 -- 
3rd quarter 2nd quarter non-OPEC OPEC 
38% 4% countries countries 


Law Firm Salaries A back-to-back stem-and-leaf plot compares two data 
sets by using the same stems for each data set. Leaves for the first data set 
are on one side while leaves for the second data set are on the other side. The 
back-to-back stem-and-leaf plot shows the salaries (in thousands of dollars) 
of all lawyers at two small law firms. 


(a) What are the lowest and highest salaries at Law Firm A? at Law Firm B? 
(b) How many lawyers are in each firm? 


(c) Compare the distribution of salaries at each law firm. What do you 
notice? 


Yoga Classes The data sets show the ages of all participants in two yoga 
classes. 

3:00 p.m. Class 8:00 p.m. Class 

40 60 73 77 51 68 19 18 20 29 39 43 

68 35 68 53 64 75 71 56 44 44 18 19 

76 69 59 S55 38 57 19 18 18 20 25 29 

68 84 75 62 73 75 25 22 31 24 24 23 

85 77 19 19 18 28 20 31 


(a) Make a back-to-back stem-and-leaf plot to display the data. 


(b) What are the lowest and highest ages of participants in the 3:00 PM. 
class? in the 8:00 P.M. class? 


(c) How many participants are in each class? 


(d) Compare the distribution of ages in each class. What conclusion(s) can 
you make based on your observations? 
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> How to find the mean, 
median, and mode of a 
population and of a sample 


Ss as 


How to find a weighted mean 
of a data set and the mean of 
a frequency distribution 


i 


How to describe the shape 

of a distribution as symmetric, 
uniform, or skewed and how 
to compare the mean and 
median for each 


STUDY TIP 


Notice that the mean in Example 1 
has one more decimal 

place than the original 

set of data values. This 

round-off rule will be 

used throughout the 

text. Another important 
round-off rule is that ( 
rounding should not 

be done until the final 

answer of a calculation. 


74 78 81 87 81 80 77 80 
85 78 80 83 75 81 73 
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Measures of Central Tendency 


WHAT YOU SHOULD LEARN 


Mean, Median, and Mode > Weighted Mean and Mean of Grouped Data 
>» The Shapes of Distributions 


> MEAN, MEDIAN, AND MODE 


In Sections 2.1 and 2.2, you learned about the graphical representations of 
quantitative data. In Sections 2.3 and 2.4, you will learn how to supplement 
graphical representations with numerical statistics that describe the center and 
variability of a data set. 

A measure of central tendency is a value that represents a typical, or central, 
entry of a data set. The three most commonly used measures of central tendency 
are the mean, the median, and the mode. 


DEFINITION 


The mean of a data set is the sum of the data entries divided by the number 
of entries. To find the mean of a data set, use one of the following formulas. 
De Dx 


Population Mean: p = ae Sample Mean: X¥ = ae 


The lowercase Greek letter (pronounced mu) represents the population 
mean and xX (read as “x bar”) represents the sample mean. Note that N 
represents the number of entries in a population and n represents the 
number of entries in a sample. Recall that the uppercase Greek letter sigma 
(2) indicates a summation of values. 


EXAMPLE 1 G@® Report 9 


> Finding a Sample Mean 


The prices (in dollars) for a sample of round-trip flights from Chicago, Illinois 
to Cancun, Mexico are listed. What is the mean price of the flights? 


872 432 397 427 388 782 397 
> Solution 
The sum of the flight prices is 
Dx = 872 + 432 + 397 + 427 + 388 + 782 + 397 = 3695. 


To find the mean price, divide the sum of the prices by the number of prices in 


the sample. 
gee a) 5178 
n 7 


So, the mean price of the flights is about $527.90. 


> Try It Yourself 1 


The heights (in inches) of the players on the 2009-2010 Cleveland Cavaliers 
basketball team are listed. What is the mean height? 


a. Find the sum of the data entries. 
b. Divide the sum by the number of data entries. 
c. Interpret the results in the context of the data. Answer: Page A32 
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STUDY TIP 


In a data set, there are the same 
number of data values above 
the median as there are below 


é 


the median. For instance, 
in Example 2, three of 
the prices are below 
$427 and three are 
above $427. 
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DEFINITION 


The median of a data set is the value that lies in the middle of the data when 
the data set is ordered. The median measures the center of an ordered data set 
by dividing it into two equal parts. If the data set has an odd number of entries, 
the median is the middle data entry. If the data set has an even number of 
entries, the median is the mean of the two middle data entries. 


EXAMPLE 2 G@® Report 10 


» Finding the Median 
Find the median of the flight prices given in Example 1. 


> Solution 
To find the median price, first order the data. 


388 397 397 427 432 782 872 


Because there are seven entries (an odd number), the median is the middle, 
or fourth, data entry. So, the median flight price is $427. 

> Try It Yourself 2 

The ages of a sample of fans at a rock concert are listed. Find the median age. 


24 27 19 21 18 23 21 20 19 33 30 29 21 
18 24 26 38 19 35 34 33 30 21 27 30 


a. Order the data entries. 
b. Find the middle data entry. 
c. Interpret the results in the context of the data. Answer: Page A32 


EXAMPLE 3 


> Finding the Median 

In Example 2, the flight priced at $432 is no longer available. What is the 
median price of the remaining flights? 

> Solution 

The remaining prices, in order, are 388, 397, 397, 427, 782, and 872. 


Because there are six entries (an even number), the median is the mean of the 
two middle entries. 


Median = ae = 412 


So, the median price of the remaining flights is $412. 


> Try It Yourself 3 


The prices (in dollars) of a sample of digital photo frames are listed. Find the 
median price of the digital photo frames. 


25 100 130 60 140 200 220 80 250 97 


a. Order the data entries. 
b. Find the mean of the two middle data entries. 
c. Interpret the results in the context of the data. Answer: Page A32 
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DEFINITION 


The mode of a data set is the data entry that occurs with the greatest frequency. 
A data set can have one mode, more than one mode, or no mode. If no 
entry is repeated, the data set has no mode. If two entries occur with the 
same greatest frequency, each entry is a mode and the data set is called 
bimodal. 


EXAMPLE 4 G® Report 11 
> Finding the Mode 


INSIG a Find the mode of the flight prices given in Example 1. 
The mode is the only 
measure of central > Solution 


tendency that can 


recipe erie ; Ordering the data helps to find the mode. 


data at the nominal 388 397 397 427 432 782 872 

level of measurement. & . 

But when working with From the ordered data, you can see that the entry 397 occurs twice, whereas 
quantitative data, the the other data entries occur only once. So, the mode of the flight prices is $397. 


i | ; 
moe elevate use > Try It Yourself 4 


The prices (in dollars per square foot) for a sample of South Beach (Miami 
Beach, FL) condominiums are listed. Find the mode of the prices. 


324 462 540 450 638 564 670 618 624 825 
540 980 1650 1420 670 830 912 750 1260 450 
975 670 1100 980 750 723 705 385 475 720 


a. Write the data in order. 
b. Identify the entry, or entries, that occur with the greatest frequency. 
c. Interpret the results in the context of the data. Answer: Page A32 


EXAMPLE 5 


At a political debate, a sample of audience members were asked to name the 


D t 34 _ 
ae political party to which they belonged. Their responses are shown in the table. 

See 36 What is the mode of the responses? 

Other 21 

Did not respond 9 > Solution 


The response occurring with the greatest frequency is Republican. So, the 
mode is Republican. 


Interpretation In this sample, there were more Republicans than people of 
any other single affiliation. 


> Try It Yourself 5 


In a survey, 1000 US. adults were asked if they thought public cellular 
phone conversations were rude. Of those surveyed, 510 responded “Yes,” 
370 responded “No,” and 120 responded “Not sure.” What is the mode of the 
responses? (Adapted from Fox TV/Rasmussen Reports) 


a. Identify the entry that occurs with the greatest frequency. 
b. Interpret the results in the context of the data. Answer: Page A32 
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20 20 20 20 20 20 21 
21 21 21 22 22 22 23 
23 23 23 24 24 65 


Outlier 7, 


The National Association of 
Realtors keeps a databank of 
existing-home sales. One list 

uses the median price of existing 
homes sold and another uses 

the mean price of existing homes 
sold. The sales for the third 
quarter of 2009 are shown 

in the double-bar graph. 


(Source: National Association of Realtors) 


2009 U.S. 


Existing-Home Sales 
A 
260 + 


© Median price 
240 —- Mean price 


Existing-home price 


=> 
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le} 
ne} 
zr) 
nH 
E 
H 
= 
° 
| 
= 


> 


July Aug. Sept. 
Month 


Notice in the graph that each 
month the mean price is about 
$45,000 more than the median 
price. What factors would cause 
the mean price to be greater 
than the median price? 


Although the mean, the median, and the mode each describe a typical entry 
of a data set, there are advantages and disadvantages of using each. The mean is 
a reliable measure because it takes into account every entry of a data set. 
However, the mean can be greatly affected when the data set contains outliers. 


DEFINITION 


An outlier is a data entry that is far removed from the other entries in the 
data set. 


A data set can have one or more outliers, causing gaps in a distribution. 
Conclusions that are drawn from a data set that contains outliers may be flawed. 


EXAMPLE 6 


>» Comparing the Mean, the Median, and the Mode 

Find the mean, the median, and the mode of the sample ages of students in 
a class shown at the left. Which measure of central tendency best describes a 
typical entry of this data set? Are there any outliers? 


> Solution 
ya 2X 4D 
Mean: CT eg 23.8 years 
Median: Median = 5S = 21.5 years 
Mode: The entry occurring with the greatest frequency is 20 years. 


Interpretation The mean takes every entry into account but is influenced by 
the outlier of 65. The median also takes every entry into account, and it is not 
affected by the outlier. In this case the mode exists, but it doesn’t appear to 
represent a typical entry. Sometimes a graphical comparison can help you 
decide which measure of central tendency best represents a data set. The 
histogram shows the distribution of the data and the locations of the mean, the 
median, and the mode. In this case, it appears that the median best describes 
the data set. 


Ages of Students in a Class 


> 
oO 
S 
o 
5 
o 
i ~ 
rere (3 
55 60 65 
Mode i Guitien= 


> Try It Yourself 6 


Remove the data entry 65 from the data set in Example 6. Then rework the 
example. How does the absence of this outlier change each of the measures? 


a. Find the mean, the median, and the mode. 
b. Compare these measures of central tendency with those found in Example 6. 
Answer: Page A33 
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>» WEIGHTED MEAN AND MEAN OF GROUPED DATA 
Sometimes data sets contain entries that have a greater effect on the mean 
than do other entries. To find the mean of such a data set, you must find the 
weighted mean. 


DEFINITION 


A weighted mean is the mean of a data set whose entries have varying weights. 
A weighted mean is given by 
> (x-w) 

=w 


where w is the weight of each entry x. 


EXAMPLE 7 


> Finding a Weighted Mean 

‘You are taking a class in which your grade is determined from five sources: 
50% from your test mean, 15% from your midterm, 20% from your final exam, 
10% from your computer lab work, and 5% from your homework. Your scores 
are 86 (test mean), 96 (midterm), 82 (final exam), 98 (computer lab), and 100 
(homework). What is the weighted mean of your scores? If the minimum 
average for an A is 90, did you get an A? 


C— 


> Solution 
Begin by organizing the scores and the weights in a table. 


Test mean 86 0.50 43.0 
Midterm 96 0.15 14.4 
| Final exam 82 0.20 16.4 
Computer lab 98 0.10 9.8 
Homework 100 0.05 5.0 


88.6 
Your weighted mean for the course is 88.6. So, you did not get an A. 


> Try It Yourself 7 


An error was made in grading your final exam. Instead of getting 82, you 
scored 98. What is your new weighted mean? 


a. Multiply each score by its weight and find the sum of these products. 

b. Find the sum of the weights. 

c. Find the weighted mean. 

d. Interpret the results in the context of the data. Answer: Page A33 
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STUDY TIP 


If the frequency distribution 
represents a population, 
then the mean of the 
frequency distribution 

is approximated by 


_ 2(x-f) 

an 

where N = 3f. 

—J 

12:5 6 75.0 
24.5 10 245.0 
36.5 13 474.5 
48.5 8 388.0 
60.5 5 302.5 
q25 6 435.0 
84.5 2 169.0 


n = 50 x = 2089.0 


Presented by: https://jafrilibrary.org 


DESCRIPTIVE STATISTICS 


If data are presented in a frequency distribution, you can approximate the 


mean as follows. 


DEFINITION 


The mean of a frequency distribution for a sample is approximated by 


x= ole Note thatn = Sf. 


where x and fare the midpoints and frequencies of a class, respectively. 


GUIDELINES 
Finding the Mean of a Frequency Distribution 


IN WORDS IN SYMBOLS 
Lower limit) + (Upper limit 
1. Find the midpoint of x= ( ) 5 (Upp ) 
each class. 
2. Find the sum of the products Sa) 


of the midpoints and the 


frequencies. 
3. Find the sum of the n = >f inconsistence 
frequencies. 
__ 2(x'f) 
4. Find the mean of the ee a 


frequency distribution. 


EXAMPLE 8 


> Finding the Mean of a Frequency Distribution 

Use the frequency distribution at the left to approximate the mean number of 
minutes that a sample of Internet subscribers spent online during their most 
recent session. 

> Solution 


2 (xf) 


n 


x= 
_ 2089.0 
50 
~ 41.8 


So, the mean time spent online was approximately 41.8 minutes. 


> Try It Yourself 8 


Use a frequency distribution to approximate the mean age of the 50 richest 
people. (See Try It Yourself 2 on page 41.) 


a. Find the midpoint of each class. 
b. Find the sum of the products of each midpoint and corresponding frequency. 
c. Find the sum of the frequencies. 


d. Find the mean of the frequency distribution. Answer: Page A33 
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“>, To explore this topic further, 
~ see Activity 2.3 on page 79. 


INSIGHT 


Be aware that there are many 
different shapes of distributions. 
In some cases, the shape cannot 
be classified as symmetric, 
uniform, or skewed. A 
distribution can have 
several gaps caused by 
outliers, or clusters of 
data. Clusters occur 
when several types of 
data are included in the 
one data set. 
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>» THE SHAPES OF DISTRIBUTIONS 


A graph reveals several characteristics of a frequency distribution. One such 
characteristic is the shape of the distribution. 


DEFINITION 


A frequency distribution is symmetric when a vertical line can be drawn 
through the middle of a graph of the distribution and the resulting halves are 
approximately mirror images. 


A frequency distribution is uniform (or rectangular) when all entries, or 
classes, in the distribution have equal or approximately equal frequencies. 
A uniform distribution is also symmetric. 


A frequency distribution is skewed if the “tail” of the graph elongates more to 
one side than to the other. A distribution is skewed left (negatively skewed) if 
its tail extends to the left. A distribution is skewed right (positively skewed) 
if its tail extends to the right. 


When a distribution is symmetric and unimodal, the mean, median, and mode are 
equal. If a distribution is skewed left, the mean is less than the median and the 
median is usually less than the mode. If a distribution is skewed right, the mean 
is greater than the median and the median is usually greater than the mode. 
Examples of these commonly occurring distributions are shown. 


9 Wt 130615 11 130«15 
+ Mean Mean 
+ Median Median 
“Mode 


Symmetric Distribution Uniform Distribution 


A A 
40 -- 40-- 
35-- 35-5 
30-- 30+ 
25 4- 25-- 
20-- 20+ 


1 3 5 of 9 . 13.015 
Mean Mode 


1 3 5 { 9 11 130«15 
Mode Mean 


Median Median 


Skewed Right Distribution 


Skewed Left Distribution 


The mean will always fall in the direction in which the distribution is skewed. 
For instance, when a distribution is skewed left, the mean is to the left of 
the median. 
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ED EXERCISES 


FOR EXTRA HELP: 


7 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


True or False? In Exercises 1-4, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


1. The mean is the measure of central tendency most likely to be affected by an 
outlier. 


2. Some quantitative data sets do not have medians. 
3. A data set can have the same mean, median, and mode. 


4. When each data class has the same frequency, the distribution is symmetric. 


Constructing Data Sets Jn Exercises 5-8, construct the described data set. 
The values in the data set cannot all be the same. 


5. Median and mode are the same. 6. Mean and mode are the same. 
7. Mean is not representative of a typical number in the data set. 


8. Mean, median, and mode are the same. 


Graphical Analysis Jn Exercises 9-12, determine whether the approximate 
shape of the distribution in the histogram is symmetric, uniform, skewed left, 
skewed right, or none of these. Justify your answer. 


9. 10. 4 
15+ 


85 95 105 115 125 135 145 155 


ll 4 12. 


1234567 8 9101112 525 625 72.5 82.5 


Matching Jn Exercises 13-16, match the distribution with one of the graphs in 
Exercises 9-12. Justify your decision. 


13. The frequency distribution of 180 rolls of a dodecagon (a 12-sided die) 


14. The frequency distribution of salaries at a company where a few executives 
make much higher salaries than the majority of employees 


15. The frequency distribution of scores on a 90-point test where a few students 
scored much lower than the majority of students 


16. The frequency distribution of weights for a sample of seventh grade boys 
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Contacts 40 


Eyeglasses 570 

Contacts and 180 
eyeglasses 

None 210 


TABLE FOR EXERCISE 23 


How Do People Eat Their 
Potatoes? 


Home fries/ 
hash browns 
120 


Chips 
50 


FIGURE FOR EXERCISE 26 
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M@ USING AND INTERPRETING CONCEPTS 


Finding and Discussing the Mean, Median, and Mode _ In Exercises 
17-34, find the mean, median, and mode of the data, if possible. If any of these 
measures cannot be found or a measure does not represent the center of the data, 
explain why. 


17. 


18. 


19. 


20. 


23. 


24. 


25. 


26. 


Concert Tickets 
last 13 purchases 


425 8 6643 247 8 5 


The number of concert tickets purchased online for the 


Tuition The 2009-2010 tuition and fees (in thousands of dollars) for the 
top 10 liberal arts colleges (Source: U.S. News and World Report) 


39 39 38 51 38 40 37 40 35 39 


MCAT Scores The average medical college admission test (MCAT) scores 
for asample of seven medical schools (Source: Association of American Medical 
Colleges) 


11.0 11.7 103 11.7 11.7 10.7 9.7 


Cholesterol The cholesterol levels of a sample of 10 female employees 


154 240 171 188 235 203 184 173 181 275 


21. NFL The average points per game scored by each NFL team during 
the 2009 regular season (Source: National Football League) 


20.4 19.7 17.5 26.7 22.7 21.8 16.6 29.4 
26.0 225 28.8 19.1 181 123 164 15.2 
16.1 23.4 20.6 184 23.0 25.1 268 31.9 
24.4 284 204 22.1 15.3 10.9 242 22.6 


22. Power Failures The durations (in minutes) of power failures at a 
residence in the last 10 years 


18 26 45 75 125 80 33 40 44 49 
89 80 96 125 12 61 31 63 103 28 


Eyeglasses and Contacts The responses of a sample of 1000 adults who 
were asked what type of corrective lenses they wore are shown in the table 
at the left. (Adapted from American Optometric Association) 


Living on Your Own The responses of a sample of 1177 young adults who 
were asked what surprised them the most as they began to live on their own 
(Adapted from Charles Schwab) 


Amount of first salary: 63 
Number of decisions: 163 Money needed: 326 
Paying bills: 150 Trying to save: 275 
How hard it is breaking away from parents: 75 


Trying to find a job: 125 


Top Speeds 
sports cars 


187.3, 181.8 180.0 169.3 162.2 158.1 155.7 


The top speeds (in miles per hour) for a sample of seven 


Potatoes The pie chart at the left shows the responses of a sample of 
1000 adults who were asked their favorite way to eat potatoes. (Adapted from 
Idaho Potato Commission) 
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27. Typing Speeds The typing speeds (in words per minute) for several 


28. 


29. 


30. 


stenographers 
125 140 170 155 132 175 225 210 125 230 


Eating Disorders The number of weeks it took to reach a target weight for 
a sample of five patients with eating disorders treated by psychodynamic 
psychotherapy (Source: The Journal of Consulting and Clinical Psychology) 


15.0 31.5 10.0 25.5 1.0 


Eating Disorders The number of weeks it took to reach a target weight for 
a sample of 14 patients with eating disorders treated by psychodynamic 
psychotherapy and cognitive behavior techniques (Source: The Journal of 
Consulting and Clinical Psychology) 


25 20.0 11.0 105 17.5 165 13.0 
15.5 265 2.55 27.0 285 15 5.0 


Aircraft The number of aircraft that 15 airlines have in their fleets (Source: 
Airline Transport Association) 


136 110 38 625 350 755 52 32 
142 9 537 28 409 354 28 


31. Weights (in pounds) of 32. Grade Point Averages of 
Carry-On Luggage on a Plane Students in a Class 
0 | 67 Key: 3|2 = 32 0|8 Key: 0/8 = 0.8 
1 | 2589 1| 568 
210444589 2/1345 
3} 223555689 3 | 09 
4/01278 4/00 
5] 1 
33. Time (in minutes) It Takes 34. Prices (in dollars per night) of 
Employees to Drive to Work Hotel Rooms in a City 
SEER EEE o 8 
5 10 15 20 25 30 35 40 
“160 180 200 220240 


Graphical Analysis Jn Exercises 35 and 36, the letters A, B, and C are 
marked on the horizontal axis. Describe the shape of the data. Then determine 
which is the mean, which is the median, and which is the mode. Justify your 


an. 


35. 


Swers. 


Sick Days Used by Employees 36. Hourly Wages of Employees 
A 


104 4 ‘4 16 18 20 22 24 26 28 10 12 14 16 18 20 224 426428 
ABC Number of days Hourly wageA B C 
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In Exercises 37-40, without performing any calculations, determine which measure 
of central tendency best represents the graphed data. Explain your reasoning. 


37. 


39. 


What Do You Think 38. Heights of Players on Two 
About “Green” Products? Opposing Volleyball Teams 
A 
», 200+ 
2 150 +- oe 
2 100-4 a 
& 504 * 
ay 
ci 
70 71 72 73 74 75 76 77 
Response Height (in inches) 
(Adapted from Green Home 
Furnishings Consumer Study) 
Heart Rate of a Sample 40. Body Mass Index (BMI) of 
of Adults People in a Gym 
A A 


[on ENo} 
1 4 
| 


an 
1 4 
, 4 


™~ 
| 
1 


Frequency 
a 
i 
Frequency 
nn 
t 


Re Nw 
f , 


55 60 65 70 75 80 85 18 20 22 24 26 28 30 


Heart rate (beats per minute) BMI 


Finding the Weighted Mean In Exercises 41-46, find the weighted mean of 
the data. 


41. 


42. 


43. 


AA, 


45. 


Final Grade The scores and their percents of the final grade for a statistics 
student are given. What is the student’s mean score? 


Score Percent of final grade 
Homework 85 5% 
Quizzes 80 35% 
Project 100 20% 
Speech 90 15% 
Final exam 93 25% 


Salaries The average starting salaries (by degree attained) for 25 employees 
at a company are given. What is the mean starting salary for these employees? 


8 with MBAs: $92,500 17 with BAs in business: $68,000 


Account Balance For the month of April, a checking account has a balance 
of $523 for 24 days, $2415 for 2 days, and $250 for 4 days. What is the 
account’s mean daily balance for April? 


Account Balance For the month of May, a checking account has a balance 
of $759 for 15 days, $1985 for 5 days, $1410 for 5 days, and $348 for 6 days. 
What is the account’s mean daily balance for May? 


Grades A student receives the following grades, with an A worth 4 points, 
a B worth 3 points, a C worth 2 points, and a D worth 1 point. What is the 
student’s mean grade point score? 


B in 2 three-credit classes D in 1 two-credit class 
A in 1 four-credit class C in 1 three-credit class 
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46. Scores The mean scores for students in a statistics course (by major) are 
given. What is the mean score for the class? 
9 engineering majors: 85 
5 math majors: 90 
13 business majors: 81 


47. Final Grade In Exercise 41, an error was made in grading your final exam. 
Instead of getting 93, you scored 85. What is your new weighted mean? 


48. Grades In Exercise 45, one of the student’s B grades gets changed to an A. 
What is the student’s new mean grade point score? 


Finding the Mean of Grouped Data In Exercises 49-52, approximate the 
mean of the grouped data. 


49. Fuel Economy The highway 50. Fuel Economy The city mileage 


mileage (in miles per gallon) for (in miles per gallon) for 24 family 
30 small cars sedans 
Mileage Mileage 

(miles per gallon) Frequency (miles per gallon) Frequency 
29-33 11 22-27 16 
34-38 12 28-33 2 
39-43 2 34-39 2 
4448 5 40-45 3 

46-51 1 


51. Ages The ages of residents of a 52. Phone Calls The lengths of 
town calls (in minutes) made by a 
salesperson in one week 


Age Frequency 
0-9 55 Length Number 
10-19 70 of call of calls 
20-29 35 1-5 12 
30-39 56 6-10 26 
40-49 74 11-15 20 
50-59 42 16-20 7 
60-69 38 21-25 11 
70-79 17 26-30 7 
80-89 10 31-35 4 
36-40 4 
41-45 1 


Identifying the Shape of a Distribution Jn Exercises 53-56, construct a 
frequency distribution and a frequency histogram of the data using the indicated 
number of classes. Describe the shape of the histogram as symmetric, uniform, 
negatively skewed, positively skewed, or none of these. 


". 53. Hospital Beds 
Number of classes: 5 
Data set: The number of beds in a sample of 24 hospitals 


149 167 162 127 130 180 160 167 
221 145 137 194 207 150 254 262 
244 297 137 204 166 174 180 151 
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Canada: 261.1 Japan: 65.1 
Mexico: 151.2 South Korea: 34.7 
Germany: 54.5 Singapore: 27.9 


Taiwan: 24.9 France: 28.8 
Netherlands: 39.7 Brazil: 32.3 
China: 69.7 Belgium: 28.9 


Australia: 22.2 Italy: 15.5 
Malaysia: 12.9 Thailand: 9.1 
Switzerland: 22.0 

Saudi Arabia: 12.5 

United Kingdom: 53.6 


TABLE FOR EXERCISE 58 
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s 
4 


57. 


58. 
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54. Hospitalization 
Number of classes: 6 
Data set: The number of days 20 patients remained hospitalized 


69 7 14 4 5 6 8 4 11 
10 6 8 65 7 6 63 II 


55. Heights of Males 
Number of classes: 5 
Data set: The heights (to the nearest inch) of 30 males 


67 76 69 68 72 68 65 63 75 69 
66 72 67 66 69 73 64 62 71 73 
68 72 71 65 69 66 74 72 68 69 


56. Six-Sided Die 
Number of classes: 6 
Data set: The results of rolling a six-sided die 30 times 


14615 325 461 2 4 3 °5 
63211562 443 16 2 4 


Coffee Contents During a quality assurance check, the actual coffee 
contents (in ounces) of six jars of instant coffee were recorded as 6.03, 5.59, 
6.40, 6.00, 5.99, and 6.02. 


(a) Find the mean and the median of the coffee content. 
(b) The third value was incorrectly measured and is actually 6.04. Find the 
mean and median of the coffee content again. 


(c) Which measure of central tendency, the mean or the median, was affected 
more by the data entry error? 


U.S. Exports The table at the left shows the U.S. exports (in billions of 
dollars) to 19 countries for a recent year. (Source: U.S. Department of 
Commerce) 


(a) Find the mean and median. 

(b) Find the mean and median without the U.S. exports to Canada. Which 
measure of central tendency, the mean or the median, was affected more 
by the elimination of the Canadian exports? 

(c) The U.S. exports to India were $17.7 billion. Find the mean and median 
with the Indian exports added to the original data set. Which measure of 
central tendency was affected more by adding the Indian exports? 


In Exercises 59 and 60, use StatCrunch to find the sample size, mean, 
median, minimum data value, and maximum data value of the data. 


59. 


60. 


The data represent the amounts (in dollars) made by several families during 
a community yard sale. 


95 120 125.50 105.25 82 102.75 130 151.50 145.25 79 97 


The data represent the prices (in dollars) of the stocks in the Dow Jones 
Industrial Average during a recent session. (Source: CNN Money) 


83.62 15.90 42.61 26.35 16.89 61.46 62.07 79.53 24.99 34.05 
69.62 16.77 52.69 21.46 132.39 65.10 44.56 29.08 62.54 39.92 
31.07 19.46 57.19 28.30 61.49 49.28 72.77 31.38 54.33 31.06 
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DESCRIPTIVE STATISTICS 


M@ EXTENDING CONCEPTS 


61. 


62. 


63. 


65. 


Golf The distances (in yards) for nine holes of a golf course are listed. 
336 393 408 522 147 504 177 375 360 


(a) Find the mean and median of the data. 

(b) Convert the distances to feet. Then rework part (a). 

(c) Compare the measures you found in part (b) with those found in part 
(a). What do you notice? 


(d) Use your results from part (c) to explain how to find quickly the mean 
and median of the given data set if the distances are measured in inches. 


Data Analysis A consumer testing service obtained the following mileages 
(in miles per gallon) in five test runs performed with three types of compact 
cars. 

Run 1 Run 2 Run 3 Run 4 Run 5 


Car A: 28 32 28 30 34 
Car B: 31 29 31 29 31 
Car C: 29 32 28 32 30 


(a) The manufacturer of Car A wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 


(b) The manufacturer of Car B wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 

(c) The manufacturer of Car C wants to advertise that its car performed 
best in this test. Which measure of central tendency—mean, median, or 
mode—should be used for its claim? Explain your reasoning. 


Midrange Another measure of central tendency that is rarely used but is 
easy to calculate is the midrange. It can be found by the formula 


(Maximum data entry) + (Minimum data entry) 
5) ; 


Which of the manufacturers in Exercise 62 would prefer to use the midrange 
statistic in their ads? Explain your reasoning. 


64. Data Analysis Students in an experimental psychology class did 
research on depression as a sign of stress. A test was administered to a 
sample of 30 students. The scores are given. 


44 51 11 90 76 36 64 37 43 72 53 62 36 74 51 
72 37 28 38 61 47 63 36 41 22 37 51 46 85 13 
(a) Find the mean and median of the data. 


(b) Draw a stem-and-leaf plot for the data using one row per stem. 
Locate the mean and median on the display. 


(c) Describe the shape of the distribution. 
Trimmed Mean _ To find the 10% trimmed mean of a data set, order the 
data, delete the lowest 10% of the entries and the highest 10% of the entries, 
and find the mean of the remaining entries. 
(a) Find the 10% trimmed mean for the data in Exercise 64. 
(b) Compare the four measures of central tendency, including the midrange. 


(c) What is the benefit of using a trimmed mean versus using a mean found 
using all data entries? Explain your reasoning. 
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Mean Versus Median 


The mean versus median applet is designed to allow you to investigate 
APPLET interactively the mean and the median as measures of the center of a data set. 
Points can be added to the plot by clicking the mouse above the horizontal axis. 
The mean of the points is shown as a green arrow and the median is shown as a 
red arrow. If the two values are the same, then a single yellow arrow is 
displayed. Numeric values for the mean and median are shown above the plot. 
Points on the plot can be removed by clicking on the point and then dragging the 
point into the trash can. All of the points on the plot can be removed by 
simply clicking inside the trash can. The range of values for the horizontal axis 
can be specified by inputting lower and upper limits and then clicking UPDATE. 


Mean: Median: 


\o 


Lower Limit: | 1 Upper Limit: 


Update | 


m Explore 


Step 1 Specify a lower limit. 

Step 2 Specify an upper limit. 

Step 3 Add 15 points to the plot. 

Step 4 Remove all of the points from the plot. 


= Draw Conclusions 


APPLET 1. Specify the lower limit to be 1 and the upper limit to be 50. Add at least 10 
points that range from 20 to 40 so that the mean and the median are the same. 
What is the shape of the distribution? What happens at first to the mean and 
median when you add a few points that are less than 10? What happens over 
time as you continue to add points that are less than 10? 


2. Specify the lower limit to be 0 and the upper limit to be 0.75. Place 10 points 
on the plot. Then change the upper limit to 25. Add 10 more points that are 
greater than 20 to the plot. Can the mean be any one of the points that were 
plotted? Can the median be any one of the points that were plotted? Explain. 


80 CHAPTER 2 


» How to find the range of a 
data set 


> How to find the variance 
and standard deviation of a 
population and of a sample 


» How to use the Empirical Rule 
and Chebychev’'s Theorem to 
interpret standard deviation 


» How to approximate the 
sample standard deviation 
for grouped data 


INSIGHT 


Both data sets in Example 1 have 
a mean of 41.5, or $41,500, a 
median of 41, or $41,000, and 

a mode of 41, or $41,000. 
And yet the two sets differ 
significantly. 

The difference is that 
the entries in the 
second set have 

greater variation. 

Your goal in this section 
is to learn how to measure 
the variation of a data set. 
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Measures of Variation 


WHAT YOU SHOULD LEARN 


Range > Deviation, Variance, and Standard Deviation > Interpreting 
Standard Deviation > Standard Deviation for Grouped Data 


>» RANGE 


In this section, you will learn different ways to measure the variation of a data 
set. The simplest measure is the range of the set. 


DEFINITION 


The range of a data set is the difference between the maximum and minimum 
data entries in the set. To find the range, the data must be quantitative. 


Range = (Maximum data entry) — (Minimum data entry) 


EXAMPLE 1 G® Report 12 


» Finding the Range of a Data Set 


Two corporations each hired 10 graduates. The starting salaries for each 
graduate are shown. Find the range of the starting salaries for Corporation A. 


Starting Salaries for Corporation A (1000s of dollars) 
"Salary 41 | 38 | 39 | 45 | 47 | 41 | 44 | 41 | 37 | 42 
Starting Salaries for Corporation B (1000s of dollars) 


“Salary 40 23 41 50) 49 32: 41 | 29: 52-58 


> Solution 
Ordering the data helps to find the least and greatest salaries. 


37 38 39 41 41 41 42 44 45 47 


Minimum __ A Maximum 


Range = (Maximum salary) — (Minimum salary) 
= 47 — 37 
= 10 
So, the range of the starting salaries for Corporation A is 10, or $10,000. 
> Try It Yourself 1 
Find the range of the starting salaries for Corporation B. 


a. Identify the minimum and maximum salaries. 
b. Find the range. 


c. Compare your answer with that for Example 1. Answer: Page A33 
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> DEVIATION, VARIANCE, AND STANDARD DEVIATION 


As a measure of variation, the range has the advantage of being easy to compute. 
Its disadvantage, however, is that it uses only two entries from the data set. 
Two measures of variation that use all the entries in a data set are the variance 
and the standard deviation. However, before you learn about these measures of 
variation, you need to know what is meant by the deviation of an entry in a 
data set. 


DEFINITION 


The deviation of an entry x in a population data set is the difference between 
the entry and the mean wp of the data set. 


Deviation of x = x — w 


Deviations of Starting Salaries EXAMPLE 2 


for Corporation A 


» Finding the Deviations of a Data Set 
Find the deviation of each starting salary for Corporation A given in Example 1. 


> Solution 
The mean starting salary is w = 415/10 = 41.5, or $41,500. To find out how 


is | =v much each salary deviates from the mean, subtract 41.5 from the salary. For 
38 =3.5 instance, the deviation of 41, or $41,000 is 
i - 41 — 41.5 = —0.5, or —$500. Deviation of x = x — p 
47 5.5 x —, A bb 
41 —0.5 The table at the left lists the deviations of each of the 10 starting salaries. 
7 - > Try It Yourself 2 
37 45 Find the deviation of each starting salary for Corporation B given in Example 1. 
42 | 0.5 a. Find the mean of the data set. 
te 4s | Sew —0 b. Subtract the mean from each salary. Answer: Page A33 


In Example 2, notice that the sum of the deviations is zero. Because this is 
true for any data set, it doesn’t make sense to find the average of the deviations. 
To overcome this problem, you can square each deviation. When you add the 
squares of the deviations, you compute a quantity called the sum of squares, 
denoted SS,. In a population data set, the mean of the squares of the deviations 
is called the population variance. 


DEFINITION 


The population variance of a population data set of N entries is 
TG =p) 
N : 


Population variance = 07 = 


The symbol a is the lowercase Greek letter sigma. 
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Sum of Squares of Starting Salaries 


for Corporation A 


41 —0.5 0.25 
38 =35 12.25 
39 =25 6.25 
45 3.5 12.25 
47 5.5 30.25 
41 —0.5 0.25 
44 25 6.25 
41 —0.5 0.25 
37 —4.5 20.25 
42 0.5 0.25 
Z=0 | SS, = 885 
STUDY TIP 


Notice that the variance and 
standard deviation in Example 3 
have one more decimal place 


than the original set of 
data values has. This 

is the same round-off 
rule that was used to 
calculate the mean. 


‘Z 


—5 
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DEFINITION 


The population standard deviation of a population data set of N entries is the 
square root of the population variance. 
eS): 


Population standard deviation = 0 = Vo? = N 


GUIDELINES 


Finding the Population Variance and Standard Deviation 


IN WORDS IN SYMBOLS 
1. Find the mean of the population data set. w= a 
2. Find the deviation of each entry. xe = fo 
3. Square each deviation. (x — p)? 
4. Add to get the sum of squares. Soa = = 
one G 3 2 >> (x = ae 
5. Divide by N to get the population variance. c= N 
_ e 
6. Find the square root of the variance c= 2 ae H) 


to get the population standard deviation. 


EXAMPLE 3 


» Finding the Population Standard Deviation 

Find the population standard deviation of the starting salaries for Corporation A 
given in Example 1. 

> Solution 

The table at the left summarizes the steps used to find SS,. 


88.5 88.5 
= (gee saad py = ——— 
N=10, oP=7 R89, o | Ma 


So, the population variance is about 8.9, and the population standard deviation 
is about 3.0, or $3000. 
> Try It Yourself 3 


Find the population variance and standard deviation of the starting salaries for 
Corporation B given in Example 1. 


SS, = 88.5, 


. Find the mean and each deviation, as you did in Try It Yourself 2. 

. Square each deviation and add to get the sum of squares. 

. Divide by N to get the population variance. 

. Find the square root of the population variance to get the population 
standard deviation. 

e. Interpret the results by giving the population standard deviation in dollars. 

Answer: Page A33 


ane So & 
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STUDY TIP 


Note that when you find the 


population variance, 
you divide by WN, the 
number of entries, 
but, for technical 
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DEFINITION 


The sample variance and sample standard deviation of a sample data set of n 
entries are listed below. 


S(x- x) 
ia = il 


2 


Sample variance = s 


reasons, when you find 
the sample variance, 
you divide by n — 1, 
one less than the 
number of entries. 


D(x — x)? 


Sample standard deviation = s = Vig = Sea 


, GUIDELINES 


Finding the Sample Variance and Standard Deviation 


IN WORDS IN SYMBOLS 
Symbols in Variance and Standard 1. Find the mean of the sample data set. x= 2x 
Deviation Formulas Wl 
2. Find the deviation of each entry. Ka 0 
3. Square each deviation. (x - x) 
4. Add to get the sum of squares. SSo= > Gen) 
oe , ee) 
5. Divide by n — 1 to get the sample variance. = ear 
_ +2 
6. Find the square root of the variance s= 2 a Ee 


to get the sample standard deviation. 


EXAMPLE 4 G@® Report 13 


» Finding the Sample Standard Deviation 
See MINITAB and TI-83/84 Plus steps The starting salaries given in Example 1 are for the Chicago branches of 
on pages 122 and 123. Corporations A and B. Each corporation has several other branches, and you 
plan to use the starting salaries of the Chicago branches to estimate the 
starting salaries for the larger populations. Find the sample standard deviation 
of the starting salaries for the Chicago branch of Corporation A. 


> Solution 


= SB? 298, S= 8 2 34] 


SS, = 88.5, n= 10, 9 


So, the sample variance is about 9.8, and the sample standard deviation is 
about 3.1, or $3100. 


> Try It Yourself 4 


Find the sample standard deviation of the starting salaries for the Chicago 
branch of Corporation B. 


a. Find the sum of squares, as you did in Try It Yourself 3. 
b. Divide by n — 1 to get the sample variance. 
c. Find the square root of the sample variance to get the sample standard 
deviation. 
d. Interpret the results by giving the sample standard deviation in dollars. 
Answer: Page A33 
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EXAMPLE 5 


> Using Technology to Find the Standard Deviation 


Sample office rental rates (in dollars per square foot per year) for Miami’s 
central business district are shown in the table. Use a calculator or a computer 


35.00 33.50 37.00 


23.75 26.50 31.25 to find the mean rental rate and the sample standard deviation. (Adapted from 
36.50 40.00 32.00 Cushman & Wakefield Inc.) 

39.25 37.50 34.75 > Solution 

37.75 37.25. 36.75 MINITAB, Excel, and the TI-83/84 Plus each have features that 
27.00 35.75 26.00 automatically calculate the means and the standard deviations of data sets. Try 


using this technology to find the mean and the standard deviation of the office 
rental rates. From the displays, you can see that x ~ 33.73 and s ~ 5.09. 


MINITAB 


Descriptive Statistics: Rental Rates 


Variable N SE Mean StDev Minimum 
STUDY TIP Rental Rates 24 1.04 5.09 23.75 
Here are instructions for Variable Q1~~— Median Q3 Maximum 


calculating the sample mean Rental Rates 2956 35.38 37.44 40.50 
and sample standard deviation 


on a TI-83/84 Plus for Example 5. 


[STAT] TI-83/84 PLUS 


37.00 29.00 40.50 
24.50 33.00 38.00 


Choose the EDIT menu. A B = SSE 
1: Edit 1 2 : 
Enter the sample office 2 Standard Error 1.038864 Sx=809.5 
rental rates into L1. Ee viedian 95.375 >x°=27899.5 
4 Mode 37 = 
Ae ee aa G| Sample Variance) 25.90172| 7X 238s 1BG38 
. |_7 | Kurtosis 0.74282 
1: 1-Var Stats | 8 | Skewness -0.70345 
ENTER &) Range 16.75 Sample Mean 
10 Minimum 23.75| Sample Standard Deviation 
2nd|L1 | ENTER 1111 | Maximum 40.5 
EP] Sum 809.5 
13 Count 24 


> Try It Yourself 5 


Sample office rental rates (in dollars per square foot per year) for Seattle’s 
central business district are listed. Use a calculator or a computer to find the 
mean rental rate and the sample standard deviation. (Adapted from Cushman & 
Wakefield Inc.) 


40.00 43.00 46.00 40.50 35.75 39.75 32.75 
36.75 35:15 38.75 38.75 36.75 38.75 39.00 
29.00 35.00 42.75 32.75 40.75 35.25 


a. Enter the data. 
b. Calculate the sample mean and the sample standard deviation. 
Answer: Page A33 
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>» INTERPRETING STANDARD DEVIATION 


INSIGHT When interpreting the standard deviation, remember that it is a measure of the 
When all data values ' typical amount an entry deviates from the mean. The more the entries are spread 
are equal, the standard - out, the greater the standard deviation. 
deviation is 0. 
Otherwise, the n r n 
standard deviation ai = a1 ai 
must be positive. by gale x=5 ee ¥=5 al 425 
eal Shes! coal s=1.2 eal s~3.0 
o 5 Q 3 3 5 
mT me oT mT 
1+ 1+ 1+ 
123456789 123456789 123456789 
Data value Data value Data value 
EXAMPLE 6 
» Estimating Standard Deviation 
~» To explore this topic further, Without calculating, estimate the population standard deviation of each data set. 
see Activity 2.4 on page 98. 
1; gh 2. ot 3. ot 
7+|N=8 7+,N=8 7+{N=8 
> 6+) u=4 B67) w=4 @ 67 | ba4 
B 57 5 57 B57 
& 4+ & 4+ & 4+ 
& 3+ & 3+ & 3+ 
2+ 2+ 2+ 
1+ 1+ 1+ 
t= = tt > 
01234567 01234567 01234567 
Data value Data value Data value 
> Solution 
1. Each of the eight entries is 4. So, each deviation is 0, which implies that 
ao =0. 


2. Each of the eight entries has a deviation of +1. So, the population standard 
deviation should be 1. By calculating, you can see that 
o =1. 
3. Each of the eight entries has a deviation of +1 or +3. So, the population 
standard deviation should be about 2. By calculating, you can see that 


o © 2.24. 


> Try It Yourself 6 


Write a data set that has 10 entries, a mean of 10, and a population standard 
deviation that is approximately 3. (There are many correct answers.) 


a. Write a data set that has five entries that are three units less than 10 and 
five entries that are three units more than 10. 

b. Calculate the population standard deviation to check that o is 
approximately 3. Answer: Page A33 
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A survey was conducted by 

the National Center for Health 
Statistics to find the mean height 
of males in the United States. The 
histogram shows the distribution 
of heights for the 808 men exam- 
ined in the 20-29 age group. In 
this group, the mean was 69.9 
inches and the standard deviation 
was 3.0 inches. (Adapted from National 
Center for Health Statistics) 


Heights of Men in the US. 
Ages 20-29 


Relative frequency 
(in percent) 


62.165 67 69 71 73 75 77 79 
Height (in inches) 


Roughly which two heights 
contain the middle 95% of 
the data? 


Heights of Women in the U.S. 
Ages 20-29 


~~ 
56.44 59.06 61.68 64.3 66.92 69.54 72.16 
| x-2s5 | x | X¥+2s | 
x-—3s x8 X+58 X+3s 


Height (in inches) 


INSIGHT 


Data values that lie more than 
two standard deviations from 

the mean are considered unusual. 
Data values that lie more than 
three standard deviations from 
the mean are very unusual. 
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DESCRIPTIVE STATISTICS 


Many real-life data sets 
have distributions that are 
approximately symmetric 
and bell-shaped. Later in the 
text, you will study this type 
of distribution in detail. For 
now, however, the following 
Empirical Rule can help 
you see how valuable the 
standard deviation can be 
as a measure of variation. 


Bell-Shaped Distribution 


3 standard deviations 


m£_wei— 95% within —————> 
2 standard deviations 


<— 68% within > 
1 standard 
deviation 


99.7% within ———~ > 


EMPIRICAL RULE (OR 68-95-99.7 RULE) 


For data with a (symmetric) bell-shaped distribution, the standard deviation 
has the following characteristics. 


1. About 68% of the data lie within one standard deviation of the mean. 
2. About 95% of the data lie within two standard deviations of the mean. 


3. About 99.7% of the data lie within three standard deviations of 
the mean. 


EXAMPLE 7 


> Using the Empirical Rule 


In a survey conducted by the National Center for Health Statistics, the sample 
mean height of women in the United States (ages 20-29) was 64.3 inches, with 
a sample standard deviation of 2.62 inches. Estimate the percent of women 
whose heights are between 59.06 inches and 64.3 inches. (Adapted from National 
Center for Health Statistics) 


> Solution 


The distribution of women’s heights is shown. Because the distribution is 
bell-shaped, you can use the Empirical Rule. The mean height is 64.3, so when 
you subtract two standard deviations from the mean height, you get 


X — 2s = 64.3 — 2(2.62) = 59.06. 
Because 59.06 is two standard deviations below the mean height, the percent 
of the heights between 59.06 and 64.3 inches is 13.5% + 34% = 47.5%. 
Interpretation So,47.5% of women are between 59.06 and 64.3 inches tall. 


> Try It Yourself 7 


Estimate the percent of women’s heights that are between 64.3 and 66.92 inches 
tall. 


a. How many standard deviations is 66.92 to the right of 64.3? 
b. Use the Empirical Rule to estimate the percent of the data between X and 
Xe eS 


c. Interpret the result in the context of the data. Answer: Page A33 
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INSIGHT 


In Example 8, Chebychev's 
Theorem gives you an inequality 
statement that says that at least 
75% of the population of Florida 
is under the age of 88.8. This is a 
true statement, but it is not nearly 
as strong a statement as could be 
made from reading the histogram. 


In general, Chebychev's Theorem 
gives the minimum percent 

of data values that fall 
within the given number 
of standard deviations 
of the mean. 
Depending on the 
distribution, there is 
probably a higher percent 
of data falling in the 
given range. 
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The Empirical Rule applies only to (symmetric) bell-shaped distributions. 


What if the distribution is not bell-shaped, or what if the shape of the distribution 
is not known? The following theorem gives an inequality statement that applies 
to all distributions. It is named after the Russian statistician Pafnuti Chebychev 
(1821-1894). 


CHEBYCHEV’S THEOREM 


The portion of any data set lying within k standard deviations (k > 1) of 
the mean is at least 
1 
1 - 2 
e k = 2: In any data set, at least 1 — z = 3. or 75%, of the data lie within 
2 standard deviations of the mean. 


e k = 3: In any data set, at least 1 — a = ce or 88.9%, of the data lie within 
3 standard deviations of the mean. 


EXAMPLE 8 


» Using Chebychev’s Theorem 


The age distributions for Alaska and Florida are shown in the histograms. 
Decide which is which. Apply Chebychev’s Theorem to the data for Florida 
using k = 2. What can you conclude? 


Population (in thousands) 
Population (in thousands) 


5 15 25 35 45 55 65 75 85 
Age (in years) Age (in years) 


53 15 25 35 45 55 65 75 &5 


> Solution 


The histogram on the right shows Florida’s age distribution. You can tell 
because the population is greater and older. Moving two standard 
deviations to the left of the mean puts you below 0, because 
pe — 20 = 39.2 — 2(24.8) = -10.4. Moving two standard deviations to the 
right of the mean puts you at w+ 20 = 39.2 + 2(24.8) = 88.8. By 
Chebychev’s Theorem, you can say that at least 75% of the population of 
Florida is between 0 and 88.8 years old. 


> Try It Yourself 8 


Apply Chebychev’s Theorem to the data for Alaska using k = 2. What can 
you conclude? 


a. Subtract two standard deviations from the mean. 
b. Add two standard deviations to the mean. 
c. Apply Chebychev’s Theorem for k = 2 and interpret the results. 
Answer: Page A33 
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STUDY TIP 


Remember that formulas 


for grouped data 


require you to multiply 


by the frequencies. 


1 3 
i 2 
i 4 
i 3 
3 0 
i 4 
5 6 
2 3 
4 1 
0 3 
STUDY TIP 


Here are instructions for 
calculating the sample mean 
and sample standard deviation 
on a TI-83/84 Plus for the 


SCorRF OANA WO ON eS 


NN FP RFP OR WO FR BR 


BN RFP NR RP ODA OO Fr 


grouped data in Example 9. 


STAT 


Choose the EDIT menu. 


1: Edit 


Enter the values of x into L1. 
Enter the frequencies f into L2. 


STAT 


Choose the CALC menu. 
1: 1-Var Stats 


ENTER 


2nd| L1], 


2nd 


ENTER 
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> STANDARD DEVIATION FOR GROUPED DATA 


In Section 2.1, you learned that large data sets are usually best represented by 
frequency distributions. The formula for the sample standard deviation for a 
frequency distribution is 


& (x — x)'f 


Sample standard deviation = s = rE 


where 1 = >/f is the number of entries in the data set. 


EXAMPLE 9 


> Finding the Standard Deviation for Grouped Data 


You collect a random sample of the number of children per household in a 
region. The results are shown at the left. Find the sample mean and the sample 
standard deviation of the data set. 


> Solution 


These data could be treated as 50 individual entries, and you could use the 
formulas for mean and standard deviation. Because there are so many 
repeated numbers, however, it is easier to use a frequency distribution. 


0 10 0 


—1.8 3.24 32.40 
1 19 19 —0.8 0.64 12.16 
2 7 14 0.2 0.04 0.28 
) 7 21 12, 1.44 10.08 
4 2 8 2.2 4.84 9.68 
5 1 a) 3.2 10.24 10.24 
6 4 24 4.2 17.64 70.56 

Y=50 Y=91 > = 145.40 
a es 
x= = af = _ ~ 18 Sample mean 


Use the sum of squares to find the sample standard deviation. 


Blea x). 145.4 
eT oa = 1.7 Sample standard deviation 


n—-1 


So, the sample mean is about 1.8 children, and the sample standard deviation 
is about 1.7 children. 


> Try It Yourself 9 


Change three of the 6’s in the data set to 4’s. How does this change affect the 
sample mean and sample standard deviation? 


a. Write the first three columns of a frequency distribution. 

b. Find the sample mean. 

c. Complete the last three columns of the frequency distribution. 

d. Find the sample standard deviation. Answer: Page A33 
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When a frequency distribution has classes, you can estimate the sample mean 
and the sample standard deviation by using the midpoint of each class. 


EXAMPLE 10 


» Using Midpoints of Classes 


The circle graph at the right shows 
the results of a survey in which 
1000 adults were asked how much 
they spend in preparation for 
personal travel each year. Make a 
frequency distribution for the data. 
Then use the table to estimate the 
sample mean and the sample 
standard deviation of the data set. $100 — $199 
(Adapted from Travel Industry on z 
Association of America) 


> Solution 
Begin by using a frequency distribution to organize the data. 


0-99 49.5 18,810 —142.5 | 20,306.25 7,716,375.0 
100-199 149.5 230 34,385 —42.5 1806.25 415,437.5 
200-299 249.5 210 52,395 57.5 3306.25 694,312.5 
300-399 349.5 50 17,475 157.5 24,806.25 1,240,312.5 
400-499 449.5 60 26,970 257.5. 66,306.25 3,978,375.0 
500+ 599.5 70 41,965 407.5 166,056.25 | 11,623,937.5 
+ = 1000 > = 192,000 x = 25,668,750.0 
STUDY TIP oe ee 
When a class is open, as in the x= atl = - = 192 Sample mean 
last class, you must assign a it 1000 
single value to represent ; ae 
the midpoint. For this Use the sum of squares to find the sample standard deviation. 
‘ fh 
example, we selected 
oe be be hetiel ei | Vx xyf [25,668,750 = 750 
7h =" i ~ 160.3 Sample standard deviation 
8 n- 


So, the sample mean is $192 per year, and the sample standard deviation is 
about $160.30 per year. 


> Try It Yourself 10 


In the frequency distribution, 599.5 was chosen to represent the class of $500 
or more. How would the sample mean and standard deviation change if you 
used 650 to represent this class? 


a. Write the first four columns of a frequency distribution. 

b. Find the sample mean. 

c. Complete the /ast three columns of the frequency distribution. 

d. Find the sample standard deviation. Answer: Page A34 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain how to find the range of a data set. What is an advantage of using 
the range as a measure of variation? What is a disadvantage? 


2. Explain how to find the deviation of an entry in a data set. What is the sum 
of all the deviations in any data set? 


3. Why is the standard deviation used more frequently than the variance? 
(Hint: Consider the units of the variance.) 


4. Explain the relationship between variance and standard deviation. Can 
either of these measures be negative? Explain. 


5. Construct a sample data set for which n = 7, ¥ = 9, and s = 0. 
6. Construct a population data set for which N = 6, w = 5,andoa = 2. 


7. Describe the difference between the calculation of population standard 
deviation and that of sample standard deviation. 


8. Given a data set, how do you know whether to calculate o or s? 


9. Discuss the similarities and the differences between the Empirical Rule and 
Chebychev’s Theorem. 


10. What must you know about a data set before you can use the Empirical 
Rule? 


In Exercises 11 and 12, find the range, mean, variance, and standard deviation of 
the population data set. 


1.9 5 9 10 11 12 7 7 8 12 


12.18 20 19 21 19 17 15 
17 25 22 19 20 16 18 


In Exercises 13 and 14, find the range, mean, variance, and standard deviation of 
the sample data set. 


13.4 15 9 12 16 8 11 19 14 


14.28 25 21 15 7 14 9 
27 21 24 14 17 16 


Graphical Reasoning Jn Exercises 15-18, find the range of the data set 
represented by the display or graph. 


15. 2 | 39 Key:2|3 = 23 16.  Bride’s Age at First Marriage 
3 |002367 4 
4 | 012338 
5 |0119 g 
6 |1299 z 
7|59 - 
8 | 48 
9 0256 24 25 26 27 28 29 30 31 32 33 34 


Age (in years) 
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17. 


19. 


20. 


21. 


22. 


23. 


SECTION 2.4 MEASURES OF VARIATION 91 
_ = 18.0 | 559 Key: 0|5 = 0.5 
ee 1 | 13469 
Se eae we) wd. anes 2 | 25799 
Se ern are 31015555 
41779 
5 
6 | 347 


Archaeology The depths (in inches) at which 10 artifacts are found are 
given below. 


20.7 24.8 30.5 26.2 36.0 34.3 30.3 29.5 27.0 38.5 


(a) Find the range of the data set. 
(b) Change 38.5 to 60.5 and find the range of the new data set. 


In Exercise 19, compare your answer to part (a) with your answer to part (b). 
How do outliers affect the range of a data set? 


USING AND INTERPRETING CONCEPTS 


Graphical Reasoning Both data sets have a mean of 165. One has a 
standard deviation of 16, and the other has a standard deviation of 24. By 
looking at the graphs, which is which? Explain your reasoning. 


(a) 12} 89 Key:12|8=128 (b) 12 Key: 13|1 = 131 
13: | 558 13 | 1 
14] 12 14 | 235 
15 | 0067 15 | 04568 
16 | 459 16 | 112333 
17 | 1368 17 | 1588 
18 | 089 18 | 2345 
19 | 6 19 | 02 
20 | 357 20 
Graphical Reasoning Both data sets represented below have a mean of 50. 


One has a standard deviation of 2.4, and the other has a standard deviation 
of 5. By looking at the graphs, which is which? Explain your reasoning. 


(a) t (b) f 


Frequency 
Frequency 


42 45 48 51 54 57 60 42 45 48 S51 54 57 60 


Data value Data value 


Salary Offers You are applying for jobs at two companies. Company A 
offers starting salaries with ~ = $31,000 and o = $1000. Company B offers 
starting salaries with ~ = $31,000 and 0 = $5000. From which company are 
you more likely to get an offer of $33,000 or more? Explain your reasoning. 
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24. 


Golf Strokes An Internet site compares the strokes per round for two 
professional golfers. Which golfer is more consistent: Player A with 
bw = 71.5 strokes and o = 2.3 strokes, or Player B with w = 70.1 strokes and 
o = 1.2 strokes? Explain your reasoning. 


Comparing Two Data Sets Jn Exercises 25-28, you are asked to compare 
two data sets and interpret the results. 


25. 


26. 


27. 


28. 


Annual Salaries Sample annual salaries (in thousands of dollars) for 
accountants in Dallas and New York City are listed. 


Dallas: 41.6 50.0 49.5 38.7 39.9 45.8 44.7 47.8 40.5 
New York City: 45.6 41.5 57.6 55.1 59.3 59.0 50.6 47.2 42.3 


(a) Find the mean, median, range, variance, and standard deviation of each 
data set. 
(b) Interpret the results in the context of the real-life setting. 


Annual Salaries Sample annual salaries (in thousands of dollars) for 
electrical engineers in Boston and Chicago are listed. 


Boston: 70.4 84.2 58.5 64.5 71.6 79.9 88.3 80.1 69.9 
Chicago: 69.4 71.5 65.4 59.9 70.9 68.5 62.9 70.1 60.9 


(a) Find the mean, median, range, variance, and standard deviation of each 
data set. 
(b) Interpret the results in the context of the real-life setting. 


SAT Scores Sample SAT scores for eight males and eight females are listed. 


Male SAT scores: 1520 1750 2120 1380 1982 1645 1033 1714 
Female SAT scores: 1785 1507 1497 1952 2210 1871 1263 1588 


(a) Find the mean, median, range, variance, and standard deviation of each 
data set. 
(b) Interpret the results in the context of the real-life setting. 


Batting Averages Sample batting averages for baseball players from two 
opposing teams are listed. 


Team A: 0.295 0.310 0.325 0.272 0.256 0.297 0.320 0.384 0.235 
Team B: 0.285 0.305 0.315 0.270 0.292 0.330 0.335 0.268 0.290 


(a) Find the mean, median, range, variance, and standard deviation of each 
data set. 


(b) Interpret the results in the context of the real-life setting. 


Reasoning with Graphs /n Exercises 29-32, you are asked to compare three 
data sets. (a) Without calculating, determine which data set has the greatest sample 
standard deviation and which has the least sample standard deviation. Explain 
your reasoning. (b) How are the data sets the same? How do they differ? 


29. 


(i) oD 


Frequency 
Ww 
i 
Frequency 


amt 
Or 
— 

ot 


i 

T 
45678 9 10 45678 9 10 4 67 8 
Data value Data value Data value 
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31. 
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(i) 0 | 9 (ii) 0 | 9 (iii) O 

1 | 58 1/5 1 |5 

2 |3377 2 |333777 2 |33337777 

3.125 3 15 315 

4/1 4] 1 4 

Key: 1|5 = 15 Key: 1|5 = 15 Key: 1|5 = 15 
(i) (ii) (iii) : 
‘ 10 ll D B 14 . 7 10 i 12 3 14 ~ ? 10 lI 1D B 14 7 
(i) — (ii) (iii) 

t! 2 3.4 5-6 7 8 - | 2 3 4 5 6 7 8 = 12345 67 8 


Using the Empirical Rule In Exercises 33-38, you are asked to use the 
Empirical Rule. 


33. 


34. 


35. 


36. 


37. 


The mean value of land and buildings per acre from a sample of farms is 
$1500, with a standard deviation of $200. Estimate the percent of farms 
whose land and building values per acre are between $1300 and $1700. 
(Assume the data set has a bell-shaped distribution.) 


The mean value of land and buildings per acre from a sample of farms is 
$2400, with a standard deviation of $450. Between what two values do about 
95% of the data lie? (Assume the data set has a bell-shaped distribution.) 


Using the sample statistics from Exercise 33, do the following. (Assume the 
number of farms in the sample is 75.) 


(a) Estimate the number of farms whose land and building values per acre 
are between $1300 and $1700. 


(b) If 25 additional farms were sampled, about how many of these farms 
would you expect to have land and building values between $1300 per 
acre and $1700 per acre? 


Using the sample statistics from Exercise 34, do the following. (Assume the 
number of farms in the sample is 40.) 


(a) Estimate the number of farms whose land and building values per acre 
are between $1500 and $3300. 

(b) If 20 additional farms were sampled, about how many of these farms 
would you expect to have land and building values between $1500 per 
acre and $3300 per acre? 


The land and building values per acre for eight more farms are listed. Using 
the sample statistics from Exercise 33, determine which of the data values are 
unusual. Are any of the data values very unusual? Explain. 


$1150, $1775, $2180, $1000, $1475, $2000, $1850, $950 
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38. The land and building values per acre for eight more farms are listed. Using 


39. 


40. 


the sample statistics from Exercise 34, determine which of the data values are 
unusual. Are any of the data values very unusual? Explain. 


$3325, $1045, $2450, $3200, $3800, $1490, $1675, $2950 


Chebychev’s Theorem Old Faithful is a famous geyser at Yellowstone 
National Park. From a sample with mn = 32, the mean duration of 
Old Faithful’s eruptions is 3.32 minutes and the standard deviation is 
1.09 minutes. Using Chebychev’s Theorem, determine at least how many of 
the eruptions lasted between 1.14 minutes and 5.5 minutes. (Source: 
Yellowstone National Park) 


Chebychev’s Theorem The mean time in a women’s 400-meter dash is 
57.07 seconds, with a standard deviation of 1.05 seconds. Apply Chebychev’s 
Theorem to the data using k = 2. Interpret the results. 


Calculating Using Grouped Data Jn Exercises 41-48, use the grouped data 
formulas to find the indicated mean and standard deviation. 


41. Pets per Household The results of a random sample of the number of pets 


per household in a region are shown in the histogram. Estimate the sample 
mean and the sample standard deviation of the data set. 


Number of households 


1 2 3 
Number of pets 


42. Cars per Household The results of a random sample of the number of cars 


+ 


per household in a region are shown in the histogram. Estimate the sample 
mean and the sample standard deviation of the data set. 


A 
295 


20 - 
135 
105 
5- 


Number of households 


Number of cars 


43. Football Wins The number of regular season wins for each National 
Football League team in 2009 are listed. Make a frequency distribution 
(using five classes) for the data set. Then approximate the population 
mean and the population standard deviation of the data set. (Source: 
National Football League) 


10 9 7 6 10 99 5 14 9 8 7 
13.8 5 4 11 11 8 4 12 11 7 2 
13.9 8 3 10 8 5 1 
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44. Water Consumption The number of gallons of water consumed per 
day by a small village are listed. Make a frequency distribution (using 
five classes) for the data set. Then approximate the population mean 
and the population standard deviation of the data set. 


167 180 192 173 145 151 174 175 178 160 
195 224 244 146 162 146 177 163 149 188 


Amounts of Caffeine The amounts of caffeine in a sample of five-ounce 
servings of brewed coffee are shown in the histogram. Make a frequency 
distribution for the data. Then use the table to estimate the sample mean 
and the sample standard deviation of the data set. 


Number of 5-ounce servings 


70.5 92.5 114.5 136.5 158.5 


Caffeine (in milligrams) 


Supermarket Trips Thirty people were randomly selected and asked how 
many trips to the supermarket they had made in the past week. The responses 
are shown in the histogram. Make a frequency distribution for the data. Then 
use the table to estimate the sample mean and the sample standard deviation 
of the data set. 


Number responding 


0 1 2 3 4 


Number of supermarket trips 


U.S. Population The estimated distribution (in millions) of the U.S. population 
by age for the year 2015 is shown in the pie chart. Make a frequency distribution 
for the data. Then use the table to estimate the sample mean and the sample 
standard deviation of the data set. Use 70 as the midpoint for “65 years and 
over.” (Source: Population Division, U.S. Census Bureau) 


65 years 
and over 


35-44 15-19 


25-34 years 20-24 years 
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48. Brazil’s Population 
Brazil’s estimated popula- 
tion for the year 2015 is 
shown in the histogram. 
Make a frequency distrib- 
ution for the data. Then 
use the table to estimate 
the sample mean and the 
sample standard deviation 
of the data set. (Adapted 
from U.S. Census Bureau, 
International Data Base) 


Population (in millions) 


45 145 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5 
Age (in years) 


In Exercises 49 and 50, use StatCrunch to find the sample size, mean, 
variance, standard deviation, median, range, minimum data value, and maximum 
data value of the data. 


49, The data represent the total amounts (in dollars) spent by several families at 
a restaurant. 


49 56 75 64 55 49 62 89 30 34 60 52 60 72 75 


50. The data represent the prices (in dollars) of several Hewlett-Packard office 
printers. (Source: Hewlett-Packard) 


199.99 499.99 149.99 119.99 129.99 229.99 
179.99 89.99 299.99 249.99 349.99 99.99 


M@ EXTENDING CONCEPTS 


‘. 51. Coefficient of Variation The coefficient of variation CV describes 
the standard deviation as a percent of the mean. Because it has no 


72 180 units, you can use the coefficient of variation to compare data with 
74 168 different units. 
68 225 CV Standard deviation x 100% 
76 201 ~ Mean ° 
74 189 F ae : . 

The table at the left shows the heights (in inches) and weights (in 
69 192 pounds) of the members of a basketball team. Find the coefficient of 
72 197 variation for each data set. What can you conclude? 
a ibe 52. Shortcut Formula You used SS, = > (x — ¥)* when calculating variance 
70 174 and standard deviation. An alternative formula that is sometimes more 
69 171 convenient for hand calculations is 
oa 185 (Sar 

= 2 

B 210 SS, = Bx — — —. 


TABLE TOR -EAERCIOE 21 You can find the sample variance by dividing the sum of squares by n — 1 


and the sample standard deviation by finding the square root of the sample 
variance. 


(a) Use the shortcut formula to calculate the sample standard deviations for 
the data sets given in Exercise 27. 


(b) Compare your results with those obtained in Exercise 27. 
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Scaling Data Sample annual salaries (in thousands of dollars) for 
employees at a company are listed. 


42 36 48 51 39 39 42 36 48 33 39 42 45 


(a) Find the sample mean and sample standard deviation. 


(b) Each employee in the sample is given a 5% raise. Find the sample mean 
and sample standard deviation for the revised data set. 


(c) To calculate the monthly salary, divide each original salary by 12. Find the 
sample mean and sample standard deviation for the revised data set. 


(d) What can you conclude from the results of (a), (b), and (c)? 


Shifting Data Sample annual salaries (in thousands of dollars) for 
employees at a company are listed. 


40 35 49 53 38 39 40 37 49 34 38 43 47 


(a) Find the sample mean and sample standard deviation. 
(b) Each employee in the sample is given a $1000 raise. Find the sample 
mean and sample standard deviation for the revised data set. 


(c) Each employee in the sample takes a pay cut of $2000 from their 
original salary. Find the sample mean and sample standard deviation for 
the revised data set. 


(d) What can you conclude from the results of (a), (b), and (c)? 


Mean Absolute Deviation Another useful measure of variation for a data 
set is the mean absolute deviation (MAD). It is calculated by the formula 


>|x - x| 
ae 


(a) Find the mean absolute deviations of the data sets in Exercise 27. 
Compare your results with the sample standard deviation. 


(b) Find the mean absolute deviations of the data sets in Exercise 28. 
Compare your results with the sample standard deviation. 


Chebychev’s Theorem At least 99% of the data in any data set lie within 
how many standard deviations of the mean? Explain how you obtained your 
answer. 


Pearson’s Index of Skewness The English statistician Karl Pearson 
(1857-1936) introduced a formula for the skewness of a distribution. 
3(x — median) 


= Pearson’s index of skewness 
Ss 


Most distributions have an index of skewness between —3 and 3. When 
P > 0, the data are skewed right. When P < 0, the data are skewed left. 
When P = 0, the data are symmetric. Calculate the coefficient of skewness 
for each distribution. Describe the shape of each. 


(a) ¥ = 17,5 = 2.3, median = 19 
(b) x = 32,s = 5.1, median = 25 
(c) ¥ = 9.2,5 = 1.8, median = 9.2 
(d) x = 42,5 = 6.0, median = 40 
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Standard Deviation 


The standard deviation applet is designed to allow you to investigate interactively 

APPLET the standard deviation as a measure of spread for a data set. Points can be added 
to the plot by clicking the mouse above the horizontal axis. The mean of the 
points is shown as a green arrow. A numeric value for the standard deviation is 
shown above the plot. Points on the plot can be removed by clicking on the point 
and then dragging the point into the trash can. All of the points on the plot can 
be removed by simply clicking inside the trash can. The range of values for the 
horizontal axis can be specified by inputting lower and upper limits and then 
clicking UPDATE. 


\o 


Lower Limit: | 1 Upper Limit: Update | 


m= Explore 


Step 1 Specify a lower limit. 

Step 2 Specify an upper limit. 

Step 3 Add 15 points to the plot. 

Step 4 Remove all of the points from the plot. 


= Draw Conclusions 


APPLET 1. Specify the lower limit to be 10 and the upper limit to be 20. Plot 10 points 
that have a mean of about 15 and a standard deviation of about 3. Write 
the estimates of the values of the points. Plot a point with a value of 15. What 
happens to the mean and standard deviation? Plot a point with a value of 20. 
What happens to the mean and standard deviation? 


2. Specify the lower limit to be 30 and the upper limit to be 40. How can you plot 
eight points so that the points have the largest possible standard deviation? 
Use the applet to plot the set of points and then use the formula for standard 
deviation to confirm the value given in the applet. How can you plot eight 
points so that the points have the lowest possible standard deviation? Explain. 


Earnings of Athletes 


The earnings of professional athletes in different sports can vary. An athlete can be paid a base salary, 
earn signing bonuses upon signing a new contract, or even earn money by finishing in a certain 
position in a race or tournament. The data shown below are the earnings (for performance only, no 
endorsements) from Major League Baseball (MLB), Major League Soccer (MLS), the National 
Basketball Association (NBA), the National Football League (NFL), the National Hockey League 
(NHL), the National Association for Stock Car Auto Racing (NASCAR), and the Professional Golf 
Association Tour (PGA) for a recent year. 


Organization Number of players 
MLB 858 
MLS 410 


NBA 463 ea 
NFL 1861 f ’ 
NHL 722 ® a 
NASCAR 76 XS wo 
PGA 262 


Number of Players Separated into Earnings Ranges 


$500,001-  $2,000,001-  $6,000,001- 
Organization — $0-$500,000 $2,000,000 — $6,000,000 $10,000,000 $10,000,001 + 


MLB 353 182 164 85 74 
MLS 403 5 1 i 0 
NBA 35 157 137 77 57 
NFL 554 746 438 85 38 
NHL 42 406 237 37 0 
NASCAR 23 16 31 6 0 
PGA 110 115 36 1 0 


M@ EXERCISES 


1. Revenue Which organization had the 4. Standard Deviation Estimate the standard 
greatest total player earnings? Explain deviation for the earnings of a player in 
your reasoning. each organization. Use $19,000,000 as the 

2. Mean Earnings Estimate the mean miapom toes LONGO: 
earnings of a player in each organization. 5. Standard Deviation Which organization 
Use $19,000,000 as the midpoint for had the greatest standard deviation? 
$10,000,001 +. Explain your reasoning. 

3. Revenue Which organization had the 6. Bell-Shaped Distribution Of the seven 
greatest earnings per player? Explain your organizations, which is most bell-shaped? 


reasoning. Explain your reasoning. 
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Measures of Position 


WHAT YOU SHOULD LEARN Quartiles >» Percentiles and Other Fractiles » The Standard Score 


Vv 


vw 


i 


vv 


Gi 


How to find the first, 
second, and third quartiles 
of a data set 


How to find the interquartile 
range of a data set 


How to represent a data 
set graphically using a 
box-and-whisker plot 


How to interpret other 
fractiles such as percentiles 


How to find and interpret 
the standard score (z-score) 


>» QUARTILES 


In this section, you will learn how to use fractiles to specify the position of a data 
entry within a data set. Fractiles are numbers that partition, or divide, an ordered 
data set into equal parts. For instance, the median is a fractile because it divides 
an ordered data set into two equal parts. 


DEFINITION 


The three quartiles, Q,;, Q>, and Q3, approximately divide an ordered data set 
into four equal parts. About one quarter of the data fall on or below the first 
quartile Q,. About one half of the data fall on or below the second quartile Q, 
(the second quartile is the same as the median of the data set). About three 
quarters of the data fall on or below the third quartile Q3. 


EXAMPLE 1 G® Report 14 


> Finding the Quartiles of a Data Set 


The number of nuclear power plants in the top 15 nuclear power-producing 
countries in the world are listed. Find the first, second, and third quartiles of the 
data set. What can you conclude? (Source: International Atomic Energy Agency) 


7 18 11 6 59 17 18 54 104 20 31 8 10 15 19 


> Solution 


First, order the data set and find the median Q,. Once you find Q), divide the 
data set into two halves. The first and third quartiles are the medians of the 
lower and upper halves of the data set. 


Lower half Upper half 


—— a. 
6 7 8 10 11 15 17 18 18 19 20 31 54 59 104 


Q: Q) Q; 


Interpretation About one fourth of the countries have 10 or fewer nuclear 
power plants; about one half have 18 or fewer; and about three fourths have 
31 or fewer. 


> Try It Yourself 1 

Find the first, second, and third quartiles for the ages of the 50 richest people 
using the data set listed in the Chapter Opener on page 37. What can you 
conclude? 


a. Order the data set. 

b. Find the median Q). 

c. Find the first and third quartiles, Q, and Q3. 

d. Interpret the results in the context of the data. Answer: Page A34 
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EXAMPLE 2 


» Using Technology to Find Quartiles 

The tuition costs (in thousands of dollars) for 25 liberal arts colleges are listed. 
Use a calculator or a computer to find the first, second, and third quartiles. 
What can you conclude? 


23 25 30 23 20 22 21 15 25 24 30 25 30 
20 23 29 20 19 22 23 29 23 28 22 28 


> Solution 

MINITAB, Excel, and the TI-83/84 Plus each have features that 
automatically calculate quartiles. Try using this technology to find the first, 
second, and third quartiles of the tuition data. From the displays, you can 


STUDY TIP see that Q; = 21.5, Q, = 23, and Q; = 28. 


There are several ways to find 

the quartiles of a data set. MINITAB 

Regardless of how you find Aone Harpe pric 
the quartiles, the results are Descriptive Statistics: Tuition 


ae off a es than one Variable N Mean SEMean StDev Minimum 
CI NESINS 0 IES Tuition 25 23960 0788 3942 15.000 
in Example 2, the first 
quartile, as determined 7 Variable Q1 Median Q3 Maximum 
by Excel, is 22 instead s Tuition 21.500 23.000 28.000 30.000 
Or 215, 
DD 
TI-83/84 PLUS 
Cc D 1-Var Stats 
alla n=25 
|2/25 Quartile(A1:A25, 1) mince 15 
| 3} 30 22 Q,=21.5 
4/23 = 
Med=23 
5|20| | Guartile(A1:A25.2) | | “og 
6H) 22 eg maxX=30 
21 
815 Quartile(A1:A25,3) 
9/25 28 


Interpretation About one quarter of these colleges charge tuition of $21,500 
or less; one half charge $23,000 or less; and about three quarters charge 
$28,000 or less. 
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> Try It Yourself 2 

The tuition costs (in thousands of dollars) for 25 universities are listed. Use a 
calculator or a computer to find the first, second, and third quartiles. What can 
you conclude? 


20 26 28 25 31 14 23 15 12 26 29 24 31 
19 31 17 15 17 20 31 32 16 21 22 28 


a. Enter the data. 
b. Calculate the first, second, and third quartiles. 
c. Interpret the results in the context of the data. Answer: Page A34 


After finding the quartiles of a data set, you can find the interquartile range. 


DEFINITION 


The interquartile range (IQR) of a data set is a measure of variation that gives 
the range of the middle 50% of the data. It is the difference between the third 
and first quartiles. 


Interquartile range (IQR) = Q3 — Q; 


EXAMPLE 3 


> Finding the Interquartile Range 
Find the interquartile range of the data set given in Example 1. What can you 
conclude from the result? 
> Solution 
From Example 1, you know that Q; = 10 and Q; = 31. So, the interquartile 
range is 

IOR = Q3; — Q; = 31 — 10 = 21. 


Interpretation The number of power plants in the middle portion of the data 
set vary by at most 21. 


> Try It Yourself 3 
Find the interquartile range for the ages of the 50 richest people listed in the 
Chapter Opener on page 37. 


a. Find the first and third quartiles, Q, and Q3. 
b. Subtract Q, from Q3. 
c. Interpret the result in the context of the data. Answer: Page A34 


The IQR can also be used to identify outliers. First, multiply the IQR by 1.5. 


Then subtract that value from Q;, and add that value to Q3. Any data value 
that is smaller than Q; — 1.5(IQR) or larger than Q3 + 1.5([OR) is an outlier. 
For instance, the IQR in Example 1 is 31 — 10 = 21 and 1.5(21) = 31.5. So, 
adding 31.5 to Q3 gives Q3 + 31.5 = 31 + 31.5 = 62.5. Because 104 > 62.5, 
104 is an outlier. 


Another important application of quartiles is to represent data sets using 


box-and-whisker plots. A box-and-whisker plot (or boxplot) is an exploratory 
data analysis tool that highlights the important features of a data set. To graph a 
box-and-whisker plot, you must know the following values. 
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Of the first 44 Super Bowls 
played, Super Bowl XIV had the 
highest attendance at about 
104,000. Super Bowl | had the 
lowest attendance at about 
62,000. The box-and-whisker 
plot summarizes the attendances 
(in thousands of people) at the 
44 Super Bowls. (Source: National 
Football League) 


Super Bowl Attendance 


Tes 1B) BOS) 
\\ VA 


90 100. 110° 
Number of people (in thousands) 
About how many Super Bowl 
attendances are represented by 
the right whisker? About how 
many are represented by the 
left whisker? 


INSIGHT 


You can use a box-and-whisker 
plot to determine the 

shape of a distribution. 

Notice that the 
box-and-whisker 

plot in Example 4 

represents a distribution ‘ 

that is skewed right. 
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1. The minimum entry 4. The third quartile Q; 
2. The first quartile Q; 5. The maximum entry 


3. The median Q, 


These five numbers are called the five-number summary of the data set. 


GUIDELINES 


Drawing a Box-and-Whisker Plot 

1. Find the five-number summary of the data set. 

2. Construct a horizontal scale that spans the range of the data. 
3. Plot the five numbers above the horizontal scale. 
4 


. Draw a box above the horizontal scale from Q, to Q3 and draw a 
vertical line in the box at Q>. 


5. Draw whiskers from the box to the minimum and maximum entries. 


Box 
Whisker Whisker 
\ i 
Minimum ve Maximum 
entry OF Median, Q, Q, entry 


EXAMPLE 4 G® Report 15 


>» Drawing a Box-and-Whisker Plot 

. See MINITAB and 
Draw a box-and-whisker plot that represents | 7.83/84 Plus steps 
the data set given in Example 1. What can you on pages 122 and 123. 
conclude from the display? 


> Solution 


The five-number summary of the data set is displayed below. Using these five 
numbers, you can construct the box-and-whisker plot shown. 


Min=6, Q,=10, Q,=18, Q3;=31, Max = 104, 


Number of Power Plants 


Interpretation You can make several conclusions from the display. One is 
that about half the data values are between 10 and 31. By looking at the length 
of the right whisker, you can also conclude that the data value of 104 is a 
possible outlier. 


> Try It Yourself 4 


Draw a box-and-whisker plot that represents the ages of the 50 richest people 
listed in the Chapter Opener on page 37. What can you conclude? 


a. Find the five-number summary of the data set. 

b. Construct a horizontal scale and plot the five numbers above it. 

c. Draw the box, the vertical line, and the whiskers. 

d. Make some conclusions. Answer: Page A34 
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INSIGHT 


Notice that the 25th 
percentile is the same as 
Q,; the 50th percentile 
is the same as Q), or 
the median; and the 
75th percentile is the 
same as Q3. 


STUDY TIP 


It is important that you 
understand what a percentile 
means. For instance, if the weight 
of a six-month-old infant is at 

the 78th percentile, the infant 
weighs more than 78% 
of all six-month-old 
infants. It does not 
mean that the infant 
weighs 78% of some 
ideal weight. 


Ages of the 50 Richest People 


A 
100 - 


90 - 
80 5 
70 - 
60 - 
50- 
40- 
305 
20- 
10 5 


Percentile 


9 xO ge 6 


or 9° ae o Ss 
Age 
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>» PERCENTILES AND OTHER FRACTILES 


In addition to using quartiles to specify a measure of position, you can also use 
percentiles and deciles. These common fractiles are summarized as follows. 


Quartiles Divide a data set into 4 equal parts. Q,, Qo, Q3 
Deciles Divide a data set into 10 equal parts. Dy, Dz, D3,..., Do 
Percentiles | Divide a data set into 100 equal parts. — Pj, P2, P3,..., Pog 


Percentiles are often used in education and health-related fields to indicate 
how one individual compares with others in a group. They can also be used to 
identify unusually high or unusually low values. For instance, test scores and 
children’s growth measurements are often expressed in percentiles. Scores or 
measurements in the 95th percentile and above are unusually high, while those 
in the 5th percentile and below are unusually low. 


EXAMPLE 5 


> Interpreting Percentiles SAT Scores 
The ogive at the right represents the fa 
cumulative frequency distribution for 90 ++ 
SAT test scores of college-bound 80 -- 
students in a recent year. What test J0=r- 


score represents the 62nd percentile? 
How should you interpret this? 
(Source: The College Board) 30 + 


Percentile 
wn 
oO 
| 
| 


fe Se a I Py Fe Ce Ca Pa a se | 
Tt Ht 


600 900 1200 1500 1800 2100 2400 


Score 
> Solution SAT Scores 
From the ogive, you can see that the 100 
62nd percentile corresponds to a test 90 
score of 1600. 80 


Interpretation This means __ that 
approximately 62% of the students 
had an SAT score of 1600 or less. 


Percentile 
wn 
ef 


a a 
600 900 1200 1500 1800 2100 2400 
Score 


| ee Oe ee ees ee ee ee 
Fela thot k ieih 


> Try It Yourself 5 


The ages of the 50 richest people are represented in the cumulative frequency 
graph at the left. At what percentile is someone who is 66 years old? How 
should you interpret this? 


a. Use the graph to find the percentile that corresponds to the given age. 
b. Interpret the results in the context of the data. Answer: Page A34 
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Very unusual scores 


Unusual scores 


z-score 
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> THE STANDARD SCORE 


When you know the mean and standard deviation of a data set, you can measure 
a data value’s position in the data set with a standard score, or z-score. 


DEFINITION 


The standard score, or z-score, represents the number of standard deviations 
a given value x falls from the mean wp. To find the z-score for a given value, 
use the following formula. 


Value — Mean 3G = UI 
Standard deviation o 


A z-score can be negative, positive, or zero. If z is negative, the corresponding 
x-value is less than the mean. If z is positive, the corresponding x-value is greater 
than the mean. And if z= 0, the corresponding x-value is equal to the 
mean. A z-score can be used to identify an unusual value of a data set that is 
approximately bell-shaped. 


EXAMPLE 6 


» Finding z-Scores 


The mean speed of vehicles along a stretch of highway is 56 miles per hour 
with a standard deviation of 4 miles per hour. You measure the speeds of three 
cars traveling along this stretch of highway as 62 miles per hour, 47 miles per 
hour, and 56 miles per hour. Find the z-score that corresponds to each speed. 
What can you conclude? 


> Solution 
The z-score that corresponds to each speed is calculated below. 
xX = 62 mph x = 47 mph x = 56 mph 
62 — 56 47 — 56 56 — 56 
Z= 4 = 15 z= 4 = —2.25 C= mn =0 


Interpretation From the z-scores, you can conclude that a speed of 62 miles 
per hour is 1.5 standard deviations above the mean; a speed of 47 miles per 
hour is 2.25 standard deviations below the mean; and a speed of 56 miles per 
hour is equal to the mean. If the distribution of the speeds is approximately 
bell-shaped, the car traveling 47 miles per hour is said to be traveling unusually 
slowly, because its speed corresponds to a z-score of —2.25. 


> Try It Yourself 6 


The monthly utility bills in a city have a mean of $70 and a standard deviation 
of $8. Find the z-scores that correspond to utility bills of $60, $71, and $92. 
What can you conclude? 


a. Identify w and o. Transform each value to a z-score. 
b. Interpret the results. Answer: Page A34 


When a distribution is approximately bell-shaped, you know from the 
Empirical Rule that about 95% of the data lie within 2 standard deviations of the 
mean. So, when this distribution’s values are transformed to z-scores, about 95% 
of the z-scores should fall between —2 and 2. A z-score outside of this range will 
occur about 5% of the time and would be considered unusual. So, according to 
the Empirical Rule, a z-score less than —3 or greater than 3 would be very 
unusual, with such a score occurring about 0.3% of the time. 
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In Example 6, you used z-scores to compare data values within the same 
data set. You can also use z-scores to compare data values from different 
data sets. 


EXAMPLE 7 


>» Comparing z-Scores from Different Data Sets 


In 2009, Heath Ledger won the Oscar for Best Supporting Actor at age 29 
for his role in the movie The Dark Knight. Penelope Cruz won the Oscar for 
Best Supporting Actress at age 34 for her role in Vicky Cristina Barcelona. 
The mean age of all Best Supporting Actor winners is 49.5, with a standard 
deviation of 13.8. The mean age of all Best Supporting Actress winners is 39.9, 
with a standard deviation of 14.0. Find the z-scores that correspond to the ages 
of Ledger and Cruz. Then compare your results. 


> Solution 
The z-scores that correspond to the ages of the two performers are calculated 
below. 
x— pm 
Heath Ledger i= —- 
_ 29 — 49.5 
13.8 
= —1.49 
x— p 
Penelope Cruz z= = 
_ 34 — 39.9 
14.0 
= —0.42 


The age of Heath Ledger was 1.49 standard deviations below the mean, and 
the age of Penelope Cruz was 0.42 standard deviation below the mean. 


Interpretation Compared with other Best Supporting Actor winners, Heath 
Ledger was relatively younger, whereas the age of Penelope Cruz was only 
slightly lower than the average age of other Best Supporting Actress winners. 
Both z-scores fall between —2 and 2, so neither score would be considered 
unusual. 


> Try It Yourself 7 


In 2009, Sean Penn won the Oscar for Best Actor at age 48 for his role in the 
movie Milk. Kate Winslet won the Oscar for Best Actress at age 33 for her role 
in The Reader. The mean age of all Best Actor winners is 43.7, with a standard 
deviation of 8.7. The mean age of all Best Actress winners is 35.9, with a 
standard deviation of 11.4. Find the z-scores that correspond to the ages of 
Penn and Winslet. Then compare your results. 


a. Identify and o for each data set. 
b. Transform each value to a z-score. 
c. Compare your results. Answer: Page A34 
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ID Exercises 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. The goals scored per game by a soccer team represent the first quartile for 
all teams in a league. What can you conclude about the team’s goals scored 


FOR EXTRA HELP: per game? 
3 Mi 2. A salesperson at a company sold $6,903,435 of hardware equipment last 


year, a figure that represented the eighth decile of sales performance at the 
company. What can you conclude about the salesperson’s performance? 


3. A student’s score on an actuarial exam is in the 78th percentile. What can you 
conclude about the student’s exam score? 


4. A counselor tells a child’s parents that their child’s IQ is in the 93rd 
percentile for the child’s age group. What can you conclude about the 
child’s IQ? 


5. Explain how the interquartile range of a data set can be used to identify 
outliers. 


6. Describe the relationship between quartiles and percentiles. 


True or False? Jn Exercises 7-14, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. The mean and median of a data set are both fractiles. 
8. About one quarter of a data set falls below Q}. 
9, The second quartile is the median of an ordered data set. 


10. The five numbers you need to graph a box-and-whisker plot are the 
minimum, the maximum, Q,, Q3, and the mean. 


11. The 50th percentile is equivalent to Q}. 
12. It is impossible to have a z-score of 0. 
13. A z-score of —2.5 is considered very unusual. 


14. A z-score of 1.99 is considered usual. 


M@ USING AND INTERPRETING CONCEPTS 


Graphical Analysis Jn Exercises 15-20, use the box-and-whisker plot to 
identify (a) the five-number summary, and (b) the interquartile range. 


——_— ll 16. 


10 3 15 «17 20 100 130 205 270 = 320 

— tt mH 
10 11 12 13 14 15 16 17 18 19 20 21 100 150 200 250 300 

i 
900 1250 1500 1950 2100 25 50 65 70 85 
Bele i ae. oa a PG 25 30 35 40 45 50 55 60 65 70 75 80 85 

19. ._ 4. 20. 6 Hh : 
-1.9 -0.5 0.1 0.7 2.1 -13 -03 02°04 2.1 
mH a 
2 -1 0 1 2 - 0 | 2 
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In Exercises 21-24, (a) find the five-number summary, and (b) draw a box-and- 
whisker plot of the data. 


21. 39 36 30 27 26 24 28 35 39 60 50 41 35 32 51 
22. 171 176 182 150 178 180 173 170 174 178 181 180 


pg 232.47752976858415287669 


Ga 
N 
aS 


ae 
23 


WwW oO 


99 2 4 3.75 4 7 
93 4 8 239 SD 


con 


13 12 5 
5-9 3 6 9 
Interpreting Graphs In Exercises 25-28, use the box-and-whisker plot to 


determine if the shape of the distribution represented is symmetric, skewed left, 
skewed right, or none of these. Justify your answer. 


Lf Lf 
0 0 

Zi. 8 28. 1 
Lf Lf 
0 0 


29. Graphical Analysis The letters A, B, and C are marked on the histogram. 
Match them with Q;, Q, (the median), and Q3. Justify your answer. 


A A 
5+ 


4+ 
34 


15 16 17 is 19 20 21 2 15 if 17 18 19420 21 {22 23 24 
A h 

BA ai R 5 

FIGURE FOR EXERCISE 29 FIGURE FOR EXERCISE 30 


30. Graphical Analysis The letters R, S, and T are marked on the histogram. 
Match them with Pj, P59, and Py. Justify your answer. 


Using Technology to Find Quartiles and Draw Graphs Jn Exercises 
31-34, use a calculator or a computer to (a) find the data set’s first, second, and 
third quartiles, and (b) draw a box-and-whisker plot that represents the data set. 


‘ 31. TV Viewing The number of hours of television watched per day by a 
sample of 28 people 


2415 7 


25 442 3 6 4 
5203 59 45 213 67 


3 
2 
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35. 


36. 
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32. Vacation Days The number of vacation days used by a sample of 
20 employees in a recent year 


39 22°7 3.3 2 2.16 
4010035 78 65 


33. Airplane Distances The distances (in miles) from an airport of a 
sample of 22 inbound and outbound airplanes 


2.8 2.0 3.0 3.0 3.2 59 3.5 3.6 
1.8 55 3.7 5.2 3.8 3.9 6.0 2.5 
40 41 46 5.0 5.5 6.0 


34. Hourly Earnings The hourly earnings (in dollars) of a sample of 
25 railroad equipment manufacturers 


15.60 18.75 14.60 15.80 14.35 13.90 17.50 17.55 13.80 
14.20 19.05 15.35 15.20 19.45 15.95 16.50 16.30 15.25 
15.05 19.10 15.20 16.22 17.75 1840 15.25 


TV Viewing Refer to the data set given in Exercise 31 and the box-and- 
whisker plot you drew that represents the data set. 


(a) About 75% of the people watched no more than how many hours of 
television per day? 

(b) What percent of the people watched more than 4 hours of television per 
day? 

(c) If you randomly selected one person from the sample, what is the 


likelihood that the person watched less than 2 hours of television per 
day? Write your answer as a percent. 


Manufacturer Earnings Refer to the data set given in Exercise 34 and the 
box-and-whisker plot you drew that represents the data set. 

(a) About 75% of the manufacturers made less than what amount per hour? 
(b) What percent of the manufacturers made more than $15.80 per hour? 


(c) If you randomly selected one manufacturer from the sample, what is the 
likelihood that the manufacturer made less than $15.80 per hour? Write 
your answer as a percent. 


Graphical Analysis Jn Exercises 37 and 38, the midpoints A, B, and C are 
marked on the histogram. Match them with the indicated z-scores. Which z-scores, 
if any, would be considered unusual? 


37. 


z=0 38. z = 0.77 
z=2.14 z= 1.54 
z= —143 z= —154 
Statistics Test Scores Biology Test Scores 

ot ot 

14+ 14+ 
 12-- g 12+ 
re 107 2 10+ 
Est E e+ 
Z ol Z ot 

44 44 

2-- gi 


53 58 63 68 73 78 


Score (out of 80) 
A B Cc 


17. 20 23 * 29 
f Score (out of 30) f 
A B 
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Adult Males Ages 20-29 


Percentile 
nn 
o 
J 
T 


63 65 67 69 fl B 15 1 7! ~ 
Height (in inches) 
FIGURE FOR EXERCISES 45-50 


Comparing Test Scores For the statistics test scores in Exercise 37, the mean 
is 63 and the standard deviation is 7.0, and for the biology test scores in Exercise 
38, the mean is 23 and the standard deviation is 3.9. In Exercises 39-42, you are 
given the test scores of a student who took both tests. 


39. 
40. 
41. 
42. 
43. 


AA, 


(a) Transform each test score to a z-score. 

(b) Determine on which test the student had a better score. 

A student gets a 75 on the statistics test and a 25 on the biology test. 
A student gets a 60 on the statistics test and a 22 on the biology test. 
A student gets a 78 on the statistics test and a 29 on the biology test. 
A student gets a 63 on the statistics test and a 23 on the biology test. 


Life Spans of Tires A certain brand of automobile tire has a mean life span 
of 35,000 miles, with a standard deviation of 2250 miles. (Assume the life 
spans of the tires have a bell-shaped distribution.) 


(a) The life spans of three randomly selected tires are 34,000 miles, 
37,000 miles, and 30,000 miles. Find the z-score that corresponds to each 
life span. According to the z-scores, would the life spans of any of these 
tires be considered unusual? 


(b) The life spans of three randomly selected tires are 30,500 miles, 
37,250 miles, and 35,000 miles. Using the Empirical Rule, find the 
percentile that corresponds to each life span. 


Life Spans of Fruit Flies The life spans of a species of fruit fly have a bell- 
shaped distribution, with a mean of 33 days and a standard deviation of 4 days. 


(a) The life spans of three randomly selected fruit flies are 34 days, 30 days, 
and 42 days. Find the z-score that corresponds to each life span and 
determine if any of these life spans are unusual. 


(b) The life spans of three randomly selected fruit flies are 29 days, 41 days, 
and 25 days. Using the Empirical Rule, find the percentile that 
corresponds to each life span. 


Interpreting Percentiles Jn Exercises 45-50, use the cumulative frequency 
distribution to answer the questions. The cumulative frequency distribution 
represents the heights of males in the United States in the 20-29 age group. The 
heights have a bell-shaped distribution (see Picturing the World, page 86) with a 
mean of 69.9 inches and a standard deviation of 3.0 inches. (Adapted from National 
Center for Health Statistics) 


45. 
46. 
47. 


48. 


49. 


50. 


What height represents the 60th percentile? How should you interpret this? 
What percentile is a height of 77 inches? How should you interpret this? 


Three adult males in the 20-29 age group are randomly selected. Their 
heights are 74 inches, 62 inches, and 80 inches. Use z-scores to determine 
which heights, if any, are unusual. 


Three adult males in the 20-29 age group are randomly selected. Their 
heights are 70 inches, 66 inches, and 68 inches. Use z-scores to determine 
which heights, if any, are unusual. 


Find the z-score for a male in the 20-29 age group whose height is 
71.1 inches. What percentile is this? 


Find the z-score for a male in the 20-29 age group whose height is 
66.3 inches. What percentile is this? 
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M@ EXTENDING CONCEPTS 


" 51. Ages of Executives The ages of a sample of 100 executives are listed. 


31 62 51 44 61 47 49 45 40 52 60 51 67 
peewee 47 63 54 59 43 63 52 50 54 61 41 48 49 
TOP ee 51 54 39 54 47 52 36 53 74 33 53 68 44 


ee ee (a) Find the five-number summary. 
le 


(b) Draw a box-and-whisker plot that represents the data set. 
FIGURE FOR EXERCISE 51 (c) Interpret the results in the context of the data. 


(d) On the basis of this sample, at what age would you expect to be an 
executive? Explain your reasoning. 


(e) Which age groups, if any, can be considered unusual? Explain your 
reasoning. 


Midquartile Another measure of position is called the midquartile. You can 
find the midquartile of a data set by using the following formula. 

+ 
Midquartile = ore 
In Exercises 52-55, find the midquartile of the given data set. 
52.5 7 12 3 10 8 7 5 3 
53. 23 36 47 33 34 40 39 24 32 22 38 41 


54. 12.3 9.7 80 154 16.1 118 12.7 134 
12.2 81 7.9 103 11.2 


55. 21.4 20.8 19.7 15.2 31.9 18.7 15.6 16.7 
19.8 13.4 22.9 28.7 19.8 17.2 30.1 


56. Song Lengths Side-by-side box-and-whisker plots can be used to compare 
two or more different data sets. Each box-and-whisker plot is drawn on the 
same number line to compare the data sets more easily. The lengths (in 
seconds) of songs played at two different concerts are shown. 


Concert 1 ote 


177 200 210 220 240 


Concert? + ; 
200 224 275 288 390 


i i 1 1 i i i 
= T T T T T T T t T t - 
125 150 175 200 225 250 275 300 325 350 375 400 


Concert length (in seconds) 


(a) Describe the shape of each distribution. Which concert has less variation 
in song lengths? 

(b) Which distribution is more likely to have outliers? Explain your 
reasoning. 


(c) Which concert do you think has a standard deviation of 16.3? Explain 
your reasoning. 


(d) Can you determine which concert lasted longer? Explain. 
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57. Credit Card Purchases The monthly credit card purchases (rounded to the 
nearest dollar) over the last two years for you and a friend are listed. 


You: 60 95 102 110 130 130 162 200 215 120 124 28 
58 40 102 105 141 160 130 210 145 90 46 76 


Friend: 100 125 132 90 85 75 140 160 180 190 160 105 
145 150 151 82 78 115 170 158 140 130 165 125 


Use a calculator or a computer to draw a side-by-side box-and-whisker plot 
that represents the data sets. Then describe the shapes of the distributions. 


Finding Percentiles You can find the percentile that corresponds to a specific 
data value x by using the following formula, then rounding the result to the 
nearest whole number. 

number of data values less than x 


Percentile of x = 7 
ercentile of x total number of data values . 


In Exercises 58 and 59, use the information from Example 7 and the fact that 
there have been 73 Oscars for Best Supporting Actor and 73 Oscars for Best 
Supporting Actress awarded. 


58. Only three winners were younger than Heath Ledger when they won the 
Oscar for Best Supporting Actor. Find the percentile that corresponds to 
Heath Ledger’s age. 


59. Forty-three winners were older than Penelope Cruz when they won the 
Oscar for Best Supporting Actress. Find the percentile that corresponds to 
Penelope Cruz’s age. 


Modified Boxplot <A modified boxplot is a boxplot that uses symbols to 
identify outliers. The horizontal line of a modified boxplot extends as far as the 
minimum data value that is not an outlier and the maximum data value that is not 
an outlier. In Exercises 60 and 61, (a) identify any outliers (using the 1.5 X IQR 
rule), and (b) draw a modified boxplot that represents the data set. Use asterisks (*) 
to identify outliers. 


60. 16 9 11 12 8 10 12 13 11 10 24 9 2 15 7 
61. 75 78 80 75 62 72 74 75 80 95 76 72 


In Exercises 62 and 63, use StatCrunch to (a) find the five-number summary, 
(b) construct a regular boxplot, and (c) construct a modified boxplot for the data. 


62. The data represent the speeds (in miles per hour) of several vehicles. 
68 88 70 72 70 69 72 62 65 70 75 52 65 


63. The data represent the weights (in pounds) of several professional football 
players. 


225 250 305 285 275 265 290 310 290 250 210 225 
308 325 260 165 195 245 235 298 395 255 268 190 
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Stock price (in dollars) 


Stock price (in dollars) 


80 
70 
60 
50 
40 
30 
20 
10 


Procter & Gamble’s Stock Price 


USES AND ABUSES 


Uses 


Descriptive statistics help you see trends or patterns in a set of raw 

data. A good description of a data set consists of (1) a measure of the center 
of the data, (2) a measure of the variability (or spread) of the data, and (3) the 
shape (or distribution) of the data. When you read reports, news items, or 
advertisements prepared by other people, you are seldom given the raw data 
used for a study. Instead you see graphs, measures of central tendency, and 
measures of variability. To be a discerning reader, you need to understand the 
terms and techniques of descriptive statistics. 


Abuses 


Knowing how statistics are calculated can help you analyze questionable 


statistics. For instance, suppose you are interviewing for a sales position 


and the company reports that the average yearly commission earned by 


the five people in its sales force is $60,000. This is a misleading statement 
if it is based on four commissions of $25,000 and one of $200,000. The 


median would more accurately describe the yearly commission, but the 


1» company used the mean because it is a greater amount. 


i i i i i 
T T T T T T T 
2002 2003 2004 2005 2006 2007 2008 2! 
Year 


sates Statistical graphs can also be misleading. Compare the two time 
series charts at the left, which show the year-end stock prices for the 


Procter: & Gamble’s Stock Price Procter & Gamble Corporation. The data are the same for each chart. 


A 


The first graph, however, has a cropped vertical axis, which makes it 


appear that the stock price increased greatly from 2002 to 2007, then 


decreased greatly from 2007 to 2009. In the second graph, the scale on 
the vertical axis begins at zero. This graph correctly shows that the stock 
price changed modestly during this time period. (Source: Procter & 


Gamble Corporation) 


| 5 


t t t t t t t 
2002 2003 2004 2005 2006 2007 2008 2 
Year 


T 
009 


Ethics 


Mark Twain helped popularize the saying, “There are three kinds of lies: lies, 
damned lies, and statistics.” In short, even the most accurate statistics can be 
used to support studies or statements that are incorrect. Unscrupulous people 
can use misleading statistics to “prove” their point. Being informed about how 
statistics are calculated and questioning the data are ways to avoid being misled. 


Mi EXERCISES 


1. Use the Internet or some other resource to find an example of a graph that 
might lead to incorrect conclusions. 


2. You are publishing an article that discusses how eating oatmeal can help 
lower cholesterol. Because eating oatmeal might help people with high 
cholesterol, you include a graph that exaggerates the effects of eating 
oatmeal on lowering cholesterol. Do you think it is ethical to publish this 
graph? Explain. 
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7) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 2.1 
= How to construct a frequency distribution including limits, midpoints, 1,2 1 


relative frequencies, cumulative frequencies, and boundaries 


= How to construct frequency histograms, frequency polygons, relative 3-7 2-6 
frequency histograms, and ogives 


Section 2.2 
= How to graph quantitative data sets using stem-and-leaf plots and dot plots 1-3 7,8 
= How to graph and interpret paired data sets using scatter plots and time 6, 7 9, 10 


series charts 


= How to graph qualitative data sets using pie charts and Pareto charts 4,5 11, 12 

Section 2.3 

= How to find the mean, median, and mode of a population and a sample 1-6 13,14 

= How to find a weighted mean of a data set and the mean of a frequency 7,8 15-18 
distribution 

= How to describe the shape of a distribution as symmetric, uniform, or 19-24 


skewed and how to compare the mean and median for each 


Section 2.4 

= How to find the range of a data set 1 25, 26 

= How to find the variance and standard deviation of a population and 2-5 27-30 
a sample 

= How to use the Empirical Rule and Chebychev’s Theorem to interpret 6-8 31-34 
standard deviation 

= How to approximate the sample standard deviation for grouped data 9, 10 35, 36 

Section 2.5 

= How to find the quartiles and interquartile range of a data set 1-3 37, 38, 4 

= How to draw a box-and-whisker plot 4 39, 40, 42 

= How to interpret other fractiles such as percentiles 5 43, 44 

= How to find and interpret the standard score (z-score) 6, 7 45-48 
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REVIEW EXERCISES 
M@ SECTION 2.1 


© In Exercises I and 2, use the following data set. The data set represents the 
number of students per faculty member for 20 public colleges. (Source: 
Kiplinger) 


13.15 15 8 16 20 28 19 18 15 
21 23 30 17 10 16 15 16 20 15 


1. Make a frequency distribution of the data set using five classes. Include the 
class limits, midpoints, boundaries, frequencies, relative frequencies, and 
cumulative frequencies. 


2. Make a relative frequency histogram using the frequency distribution in 
Exercise 1. Then determine which class has the greatest relative frequency and 
which has the least relative frequency. 


" In Exercises 3 and 4, use the following data set. The data represent the actual 
liquid volumes (in ounces) in 24 twelve-ounce cans. 


11.95 11.91 11.86 11.94 12.00 11.93 12.00 11.94 
12.10 11.95 11.99 11.94 11.89 12.01 11.99 11.94 
11.92 11.98 11.88 11.94 11.98 11.92 11.95 11.93 
3. Make a frequency histogram of the data set using seven classes. 
4. Make a relative frequency histogram of the data set using seven classes. 


"% In Exercises 5 and 6, use the following data set. The data represent the 
number of rooms reserved during one night's business at a sample of hotels. 


153 104 118 166 89 104 100 79 

93 96 116 94 140 84 81 96 
108 111 87 126 101 111 122 108 
126 93 108 87 103 95 129 93 


5. Make a frequency distribution of the data set with six classes and draw a 
frequency polygon. 


6. Make an ogive of the data set using six classes. 


M@ SECTION 2.2 


" In Exercises 7 and 8, use the following data set. The data represent the air 
quality indices for 30 U.S. cities. (Source: AIRNow) 


25 35 20 75 10 10 61 89 44 22 
34 33 38 30 47 53 44 57 71 20 
42 52 48 41 35 59 53 61 65 25 
7. Make a stem-and-leaf plot of the data set. Use one line per stem. 
8. Make a dot plot of the data set. 
9. The following are the heights (in feet) and the number of stories of nine 


notable buildings in Houston. Use the data to construct a scatter plot. What 
type of pattern is shown in the scatter plot? (Source: Emporis Corporation) 


Height (in feet) 992 780 762 756 741 732 714 662 579 
Number of stories 71 56 53 #55 47 +53 #50 49 = 40 
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* 10. The U.S. unemployment rate over a 12-year period is given. Use the data 
to construct a time series chart. (Source: U.S. Bureau of Labor Statistics) 


Year 1998 1999 2000 2001 2002 2003 
Unemployment rate 45 42 40 47 #58 6.0 
Year 2004 2005 2006 2007 2008 2009 


Unemployment rate 55 51 46 4.6 5.8 9.3 


In Exercises 11 and 12, use the following data set. The data set represents the results 
of a survey that asked U.S. adults where they would be at midnight when the new 
year arrived. (Adapted from Rasmussen Reports) 


At home At friend’s | Atrestaurant Somewhere Not sure 
home or bar else 


620 110 50 100 130 


11. Make a Pareto chart of the data set. 
12. Make a pie chart of the data set. 


M@ SECTION 2.3 


In Exercises 13 and 14, find the mean, median, and mode of the data, if possible. If 
any of these measures cannot be found or a measure does not represent the center 
of the data, explain why. 


13. Vertical Jumps The vertical jumps (in inches) of a sample of 10 college 

basketball players at the 2009 NBA Draft Combine (Source: Sports Phenoms, Inc.) 
26.0 29.5 27.0 30.5 29.5 25.0 31.5 33.0 32.0 27.5 

14. Airport Scanners The responses of 542 adults who were asked whether 


they approved the use of full-body scanners at airport security checkpoints 
(Adapted from USA Today/Gallup Poll) 


Approved: 423 Did not approve: 108 No opinion: 11 


15. Estimate the mean of the frequency distribution you made in Exercise 1. 


16. The following frequency distribution shows the number of magazine 
subscriptions per household for a sample of 60 households. Find the mean 
number of subscriptions per household. 


Number of magazines 0 1 2 3 4 5 6 
Frequency 139 #19 8 5 2 4 


17. Six test scores are given. The first 5 test scores are 15% of the final grade, and 
the last test score is 25% of the final grade. Find the weighted mean of the 
test scores. 


78 72 86 91 87 80 


18. Four test scores are given. The first 3 test scores are 20% of the final grade, 
and the last test score is 40% of the final grade. Find the weighted mean of 
the test scores. 


96 85 91 86 


19. Describe the shape of the distribution in the histogram you made in Exercise 3. 
Is the distribution symmetric, uniform, or skewed? 


20. Describe the shape of the distribution in the histogram you made in Exercise 4. 
Is the distribution symmetric, uniform, or skewed? 
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In Exercises 21 and 22, determine whether the approximate shape of the 
distribution in the histogram is symmetric, uniform, skewed left, skewed right, 
or none of these. Justify your answer. 


22. 


2 6 10 14 18 22 26 30 34 2 6 10 14 18 22 26 30 34 


23. For the histogram in Exercise 21, which is greater, the mean or the median? 
Explain your reasoning. 


24. For the histogram in Exercise 22, which is greater, the mean or the median? 
Explain your reasoning. 


M@ SECTION 2.4 


25. The data set represents the mean prices of movie tickets (in U.S. dollars) for 
a sample of 12 USS. cities. Find the range of the data set. 


7.82 7.38 6.42 6.76 6.34 7.44 615 5.46 7.92 658 8.26 7.17 


26. The data set represents the mean prices of movie tickets (in U.S. dollars) for 
a sample of 12 Japanese cities. Find the range of the data set. 


19.73, 16.48 19.10 18.56 17.68 17.19 
16.63 15.99 16.66 19.59 15.89 16.49 


27. The mileages (in thousands of miles) for a rental car company’s fleet are 
listed. Find the population mean and the population standard deviation of 
the data. 


429 12 15 3 6 8 14 14 12 3 3 


28. The ages of the Supreme Court justices as of January 27, 2010 are listed. Find 
the population mean and the population standard deviation of the data. 
(Source: Supreme Court of the United States) 


55 89 73 73 61 76 71 59 55 


29. Dormitory room prices (in dollars) for one school year for a sample of 
four-year universities are listed. Find the sample mean and the sample 
standard deviation of the data. 


2445 2940 2399 1960 2421 2940 2657 2153 
2430 2278 1947 2383 2710 2761 2377 


30. Sample salaries (in dollars) of high school teachers are listed. Find the 
sample mean and the sample standard deviation of the data. 


49,632 54,619 58,298 48,250 51,842 50,875 53,219 49,924 


31. The mean rate for satellite television for a sample of households was $49.00 
per month, with a standard deviation of $2.50 per month. Between what two 
values do 99.7% of the data lie? (Assume the data set has a bell-shaped 
distribution.) 


32. The mean rate for satellite television for a sample of households was $49.50 
per month, with a standard deviation of $2.75 per month. Estimate the 
percent of satellite television rates between $46.75 and $52.25. (Assume the 
data set has a bell-shaped distribution.) 
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33. 


The mean sale per customer for 40 customers at a gas station is $36.00, with 
a standard deviation of $8.00. Using Chebychev’s Theorem, determine at 
least how many of the customers spent between $20.00 and $52.00. 


. The mean length of the first 20 space shuttle flights was about 7 days, and 


the standard deviation was about 2 days. Using Chebychev’s Theorem, 
determine at least how many of the flights lasted between 3 days and 11 days. 
(Source: NASA) 


. From a random sample of households, the number of televisions are listed. 


Find the sample mean and the sample standard deviation of the data. 


Number of televisions 0 1 2 3 4 5 
Number of households 1 8 13 10 5 3 


. From a random sample of airplanes, the number of defects found in 


their fuselages are listed. Find the sample mean and the sample standard 
deviation of the data. 


Number of defects 0 1 2 3 4 5 6 
Number of airplanes 4 5 2 9 1 3 1 


SECTION 2.5 


In Exercises 37-40, use the following data set. The data represent the 
fuel economies (in highway miles per gallon) of several Harley-Davidson 
motorcycles. (Source: Total Motorcycle) 


53. 57 60 57 54 53 54 53 54 42 48 
53 47 47 50 48 42 42 54 54 60 


. Find the five-number summary of the data set. 

. Find the interquartile range. 

. Make a box-and-whisker plot of the data. 

. About how many motorcycles fall on or below the third quartile? 
. Find the interquartile range of the data from Exercise 13. 


. The weights (in pounds) of the defensive players on a high school football 


team are given. Draw a box-and-whisker plot of the data and describe the 
shape of the distribution. 


173 145 205 192 197 227 156 240 172 185 
208 185 190 167 212 228 190 184 195 


. A student’s test grade of 75 represents the 65th percentile of the grades. 


What percent of students scored higher than 75? 


. As of January 2010, there were 755 “oldies” radio stations in the United 


States. If one station finds that 104 stations have a larger daily audience than 
it has, what percentile does this station come closest to in the daily audience 
rankings? (Source: Radio-locator.com) 


In Exercises 45-48, use the following information. The towing capacities (in 
pounds) of 25 four-wheel drive pickup trucks have a bell-shaped distribution, with 
a mean of 11,830 pounds and a standard deviation of 2370 pounds. Use z-scores to 
determine if the towing capacities of the following randomly selected four-wheel 
drive pickup trucks are unusual. 


45. 
47. 


16,500 pounds 46. 5500 pounds 
18,000 pounds 48. 11,300 pounds 
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DI) cuarter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


"1. The data set represents the number of minutes a sample of 25 people 
exercise each week. 


108 139 120 123 120 132 123 131 131 
157 150 124 111 101 135 119 116 117 
127 128 139 119 118 114 127 


(a) Make a frequency distribution of the data set using five classes. 
Include class limits, midpoints, boundaries, frequencies, relative 
frequencies, and cumulative frequencies. 


(b) Display the data using a frequency histogram and a frequency 
polygon on the same axes. 


(c) Display the data using a relative frequency histogram. 
(d) Describe the distribution’s shape as symmetric, uniform, or skewed. 
(e) Display the data using a stem-and-leaf plot. Use one line per stem. 
(f) Display the data using a box-and-whisker plot. 
(g) Display the data using an ogive. 
2. Use frequency distribution formulas to approximate the sample mean and the 
sample standard deviation of the data set in Exercise 1. 


3. U.S. sporting goods sales (in billions of dollars) can be classified in four areas: 
clothing (10.6), footwear (17.2), equipment (24.9), and recreational transport 
(27.0). Display the data using (a) a pie chart and (b) a Pareto chart. (Source: 
National Sporting Goods Association) 


4, Weekly salaries (in dollars) for a sample of registered nurses are listed. 
774 446 1019 795 908 667 444 960 
(a) Find the mean, median, and mode of the salaries. Which best describes a 
typical salary? 


(b) Find the range, variance, and standard deviation of the data set. Interpret 
the results in the context of the real-life setting. 


5. The mean price of new homes from a sample of houses is $155,000 with a 
standard deviation of $15,000. The data set has a bell-shaped distribution. 
Between what two prices do 95% of the houses fall? 


6. Refer to the sample statistics from Exercise 5 and use z-scores to determine 
which, if any, of the following house prices is unusual. 


(a) $200,000 (b) $55,000 (c) $175,000 (d) $122,000 
"7. The number of regular season wins for each Major League Baseball team 

in 2009 are listed. (Source: Major League Baseball) 

103 95 84 75 64 87 86 79 65 65 97 87 85 75 93 

87 86 70 59 91 83 80 78 74 62 95 92 88 75 70 

(a) Find the five-number summary of the data set. 
(b) Find the interquartile range. 
(c) Display the data using a box-and-whisker plot. 
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You are a member of your local apartment association. The association 
represents rental housing owners and managers who operate 
residential rental property throughout the greater metropolitan area. 
Recently, the association has received several complaints from tenants 
in a particular area of the city who feel that their monthly rental fees 
are much higher compared to other parts of the city. 

You want to investigate the rental fees. You gather the data shown 
in the table at the right. Area A represents the area of the city where 
tenants are unhappy about their monthly rents. The data represent the 
monthly rents paid by a random sample of tenants in Area A and three 
other areas of similar size. Assume all the apartments represented are 
approximately the same size with the same amenities. 


1. How Would You Do It? 


(a) How would you investigate the complaints from renters who 
are unhappy about their monthly rents? 


(b) Which statistical measure do you think would best represent 
the data sets for the four areas of the city? 


(c) Calculate the measure from part (b) for each of the four areas. 
2. Displaying the Data 


(a) What type of graph would you choose to display the data? 
Explain your reasoning. 


(b) Construct the graph from part (a). 

(c) Based on your data displays, does it appear that the monthly 
rents in Area A are higher than the rents in the other areas of 
the city? Explain. 

3. Measuring the Data 


(a) What other statistical measures in this chapter could you use to 
analyze the monthly rent data? 


(b) Calculate the measures from part (a). 


(c) Compare the measures from part (b) with the graph you 
constructed in Exercise 2. Do the measurements support your 
conclusion in Exercise 2? Explain. 


4. Discussing the Data 


(a) Do you think the complaints in Area A are legitimate? How do 
you think they should be addressed? 


(b) What reasons might you give as to why the rents vary among 
different areas of the city? 


Pgs Real Statistics — Real Decisions 


AMERICA’S 
LEADING 
* ADVOCATE FOR 
NATIONAL APARTMENT QUALITY RENTAL 
ASSOCIATION HOUSING 


The Monthly Rents (in dollars) Paid 
by 12 Randomly Selected Apartment 
Tenants in 4 Areas of Your City 


AreaA AreaB AreaC AreaD 


ISR RRR eee 


1275 1124 1085 928 
1110 954 827 1096 
975 815 793 862 
862 1078 1170 735 
1040 843 919 798 
997 745 943 812 
1119 796 756 1232 
908 816 765 1036 
890 938 809 998 
1055 1082 1020 914 
860 750 710 1005 
975 703 7715 930 
\__Highest Monthly Rents 
AVERAGE PER CITY > 
> New York, NY $2922 
} San Francisco, CA $1904 
=| Boston, MA $1658 
San Jose, CA $1612 
> LosAngeles,CA $1452 {I 
i 


(Source: Forbes) 
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TECHNOLOGY MINITAB TI-83/84 PLUS 


Dairy Farmers of America is an association that www.dfamilk.com 
provides help to dairy farmers. Part of this help is 
gathering and distributing statistics on milk 


Milk Cows, 1999-2008 


production. 8 3/400 
= 9,300 
g 9,200 
MONTHLY MILK PRODUCTION . 9,100 
The following data set was supplied by a dairy 2 a 
farmer. It lists the monthly milk productions (in 2 99 00 01 02 03 04 05 06 07 08 
pounds) for 50 Holstein dairy cows. (Source: ee 
Matlink Dairy, Clymer, NY) (Source: National Agricultural Statistics Service) 
2825 2072 2733 2069 2484 
4285 2862 3353 1449 2029 pa ene ee Oe eo 
1258 2982 2045 1677 1619 55000 
2597 3512 2444 1773 2284 = ce 
1884 2359 2046 2364 2669 Q 
3109 2804 1658 2207 2159 ee 
2207 2882 1647 2051 2202 * 17,000 
3223, 2383, 1732, 2230-1147 99 00 01 02 03 04 05 06 07 08 
2711 1874 1979 1319 2923 Year 
2281 1230 1665 1294 2936 (Source: National Agricultural Statistics Service) 


From 1999 to 2008, the number of dairy 
cows in the United States increased by only 
1.7% while the yearly milk production per 
cow increased by almost 15%. 


M@ EXERCISES 


In Exercises 1-4, use a computer or calculator. If In Exercises 6-8, use the frequency distribution 
possible, print your results. found in Exercise 3. 

1. Find the sample mean of the data. 6. Use the frequency distribution to estimate the 
2. Find the sample standard deviation of the data. sample mean of the data. Compare your results 


Aeterna with Exercise 1. 
3. Make a frequency distribution for the data. Rete ' 
Use a class width of 500. 7. Use the frequency distribution to find the 


sample standard deviation for the data. 


4. Draw a histogram for the data. Does the Compare your results with Exercise 2. 


distribution appear to be bell-shaped? os ; 
8. Writing Use the results of Exercises 6 and 7 to 


write a general statement about the mean and 
standard deviation for grouped data. Do the 
formulas for grouped data give results that are 
as accurate as the individual entry formulas? 


5. What percent of the distribution lies within 
one standard deviation of the mean? Within 
two standard deviations of the mean? How do 
these results agree with the Empirical Rule? 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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USING TECHNOLOGY TO DETERMINE 
DESCRIPTIVE STATISTICS 


Bar Chart... 

Pie Chart... 

Time Series Plot... 
Area Graph... 
‘Contour Plot... 

3D Scatterplot... 


3D Surface Plot... 


Display Descriptive Statistics... | 


Store Descriptive Statistics... 
Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


Empirical CDF... 
Probability Distribution Plot ... 
Interval Plot... 


Individual Value Plot... 
Line Plot... 


Here are some MINITAB and TI-83/84 Plus printouts for three examples 
in this chapter. 


(See Example 7, page 59.) 


MINITAB 


Subscribers (in millions) 


1998 2000 2002 2004 2006 2008 


(See Example 4, page 83.) 


MINITAB 


Descriptive Statistics: Salaries 


Variable N Mean SE Mean StDev Minimum 
Salaries 10) 44) Ss\0}0) 0.992 Sales 37.000 
Variable Q1 Median Q3 = Maximum 
Salaries cio Wse]0 mr 0)8)0) 44.250 47.000 


(See Example 4, page 103.) 
MINITAB 


8 
€ 
as) 
AL. 
co 
o 
= 
fe) 
a 
a 
re] 
= 
o 
a 
£ 
5 
Zz 
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TI-83/84 PLUS 


STAT PLOTS 
Plot ...Off 
[221 Leo 
2: Plote...Off 
Poo ila 12 o 
3: Plots...Off 


eee ee) le Cel 


TI-83/84 PLUS | 


Plot2 Plot3 


Off 

Type: [ees | | dith 
the ok | 

Xlist: L1 

Ylist: Le 

Mark: By + . 


TI-83/84 PLUS | 


MEMORY 
4” ZDecimal 

5: ZSquare 

6: ZStandard 

7. Zirig 

8: ZlInteger 

EB ZoomStat 


O: ZoomFit 


TI-83/84 PLUS 
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(See Example 4, page 83.) 
TI-83/84 PLUS 


EDIT feNweg TESTS 


1-Var Stats 
2: 2-Var Stats 
3: Med-Med 

4: LinReg(ax+b) 
5: QuadReg 

6: CubicReg 
7\ QuartReg 


TI-83/84 PLUS | 


1-Var Stats L1 


TI-83/84 PLUS | 


1-Var Stats 

x= 41.5 

Sye AS) 

>x?= 17311 

Sx= 3.138581462 
ox= 2.974894956 
Jn= 10 
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TI-83/84 PLUS 


BB Plot 1..Off 

eee te Re aire 
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DESCRIPTIVE STATISTICS 


CUNULATIVE REVIEW 
2S ee 


Chapters 1 and 2 


In Exercises 1 and 2, identify the sampling technique used and discuss potential 
sources of bias (if any). Explain. 


1. For quality assurance, every fortieth toothbrush is taken from each of four 
assembly lines and tested to make sure the bristles stay in the toothbrush. 


2. Using random digit dialing, researchers asked 1200 U.S. adults their thoughts 
on health care reform. 


3. In 2008, a worldwide study of all airlines found that baggage delays were 
caused by transfer baggage mishandling (49%), failure to load at originating 
airport (16%), arrival station mishandling (8%), space-weight restriction 
(6%), loading/offloading error (5%), tagging error (3%), and ticketing 
error/bag baie eats (13%). Use a Pareto chart to Oe ze the 
data. i i 


In Exercises 4 and 5, determine whether the numerical value is a parameter or a 
statistic. Explain your reasoning. 


4. In 2009, the ees salary of a oe League Baseball player was 
$2,996,106. 


5. In a recent survey of 1000 voters, 19% said that First Lady of the United 
States Michelle Obama will be very involved in policy decisions. 


6. The mean annual salary for a sample of electrical engineers is $83,500, with 
a standard deviation of $1500. The data set has a bell-shaped distribution. 


(a) Use the Empirical Rule to estimate the number of electrical engineers 
whose annual salaries are between $80,500 and $86,500. 


(b) If 40 additional electrical engineers were sampled, about how many of 
these electrical engineers would you expect to have annual salaries 
between $80,500 and $86,500? 


(c) The salaries of three randomly selected electrical engineers are $90,500, 
$79,750, and $82,600. Find the z-score that corresponds to each salary. 
According to the z-scores, would the salaries of any of these engineers 
be considered unusual? 


In Exercises 7 and 8, identify the population and the sample. 


7. A survey of career counselors at 195 colleges and universities found that 
90% of the students working with their offices were interested in federal 
jobs or internships. 


8. A study of 232,606 people was conducted to find a link between eee 
antioxidant vitamins and living a longer life. ( 1 


In Exercises 9 and 10, decide which method of data collection you would use to 
collect data for the study. Explain. 


9. A study of the years of service of the 100 members of the Senate 


10. A study of the effects of removing recess from schools 
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In Exercises 11 and 12, determine whether the data are qualitative or quantitative 
and identify the data set’s level of measurement. 


11. The number of games started by pitchers with at least one start for the 
New York Yankees in 2009 are listed. (Source: Major League Baseball) 


9 4 i 3 2 Bl 7 Y G 


12. The five top-earning states in 2008 by median income are listed. (Source: 
U.S. Census Bureau) 


1.Maryland 2.New Jersey 3.Connecticut 4.Alaska 5. Hawaii 


"13. The number of tornadoes by state in a recent year is listed. (a) Find the 
data set’s five-number summary, (b) draw a box-and-whisker plot that 
represents the data set, and (c) describe the shape of the distribution. 
(Source: National Climatic Data Center) 


81 1 #8 69 30 34 0 O 56 54 
2 6 21 14 46 136 17 23 2 O 
il > Wl Ws 2 i@ 4 i © F 
Ay) ey ee Ay i ail @) i! 

19 23 105 4 0 24 4 0 63 6 


14. Five test scores are given. The first four test scores are 15% of the final grade, 
and the last test score is 40% of the final grade. Find the weighted mean of 
the test scores. 


85 92 84 89 91 
15. Tail lengths (in feet) for a sample of American alligators are listed. 
6.5 3.4 42 7.1 54 68 7.5 3.9 46 
(a) Find the mean, median, and mode of the tail lengths. Which best describes 
a typical American alligator tail length? Explain your reasoning. 
(b) Find the range, variance, and standard deviation of the data set. Interpret 


the results in the context of the real-life setting. 


16. A study shows that the number of deaths due to heart disease for women has 
decreased every year for the past five years. 


(a) Make an inference based on the results of the study. 
(b) What is wrong with this type of reasoning? 


" In Exercises 17-19, use the following data set. The data represent the points 
scored by each player on the Montreal Canadiens in a recent NHL season. 
(Source: National Hockey League) 


| 6 a0 i ail @ 3 23 sy 2s 
Bs 23) 33) 28 BZ il iy iss ie iil 
il 9 @® 3 2a al it @ Bw 


17. Make a frequency distribution using eight classes. Include the class limits, 
midpoints, boundaries, frequencies, relative frequencies, and cumulative 
frequencies. 


18. Describe the shape of the distribution. 


19. Make a relative frequency histogram using the frequency distribution in 
Exercise 17. Then determine which class has the greatest relative frequency 
and which has the least relative frequency. 


Presented by: https://jafrilibrary.org 


PROBABILITY 


3.1 Basic Concepts of 
Probability and 
Counting 
@ ACTIVITY 

3.2 Conditional 
Probability and the 
Multiplication Rule 

3.3. The Addition Rule 
@ ACTIVITY 
@ CASE STUDY 

3.4 Additional Topics 
in Probability and 
Counting 
m@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


@ TECHNOLOGY 


The television game show The Price 
Is Right presents a wide range of 
pricing games in which contestants 
compete for prizes using strategy, 
probability, and their knowledge 
of prices. One popular game is 
Spelling Bee. 
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«€ WHERE YOU'VE BEEN 


In Chapters 1 and 2, you learned how to collect 
and describe data. Once the data are collected 
and described, you can use the results to write 
summaries, form conclusions, and make deci- 
sions. For instance, in Spelling Bee, contestants 
have a chance to win a car by choosing lettered 
cards that spell CAR or by choosing a single card 
that displays the entire word CAR. By collecting 
and analyzing data, you can determine the 
chances of winning the car. 


To play Spelling Bee, contestants choose from 
30 cards. Eleven cards display the letter C, eleven 
cards display A, six cards display R, and two 


WHERE YOU’RE GOING p> 


In Chapter 3, you will learn how to determine 
the probability of an event. For instance, the 
following table shows the four ways that 
contestants on Spelling Bee can win a car and the 
corresponding probabilities. 


cards display CAR. Depending on how well 
contestants play the game, they can choose two, 
three, four, or five cards. 


Before the chosen cards are displayed, contestants 
are offered $1000 for each card. If contestants 
choose the money, the game is over. If contestants 
choose to try to win the car, the host displays one 
card. After a card is displayed, contestants are 
offered $1000 for each remaining card. If they do 
not accept the money, the host continues display- 
ing cards. Play continues until contestants take 
the money, spell the word CAR, display the word 
CAR, or display all cards and do not spell CAR. 


‘You can see from the table that choosing more 
cards gives you a better chance of winning. These 
probabilities can be found using combinations, 
which will be discussed in Section 3.4. 


Winning by selecting two cards 


Winning by selecting three cards 


Winning by selecting four cards 


Winning by selecting five cards 


57 
ee 11S 
435 2 
151 
—= ~ 0,372 
ae 
1067 
—— ~ 0.584 
1827 ~ °° 

52,363 _ ages 

1253 
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128 CHAPTER 3 PROBABILITY 


WHAT YOU SHOULD LEARN 


How to identify the sample 
space of a probability 
experiment and how to 
identify simple events 


Vv 


How to use the Fundamental 
Counting Principle to find the 
number of ways two or more 
events can occur 


4 


Vv 


How to distinguish among 
classical probability, empirical 
probability, and subjective 
probability 


7 


How to find the probability of 
the complement of an event 


v 


How to use a tree diagram 
and the Fundamental 
Counting Principle to find 
more probabilities 


STUDY TIP 


Here is a simple example of 
the use of the terms probability 
experiment, sample space, event, 
and outcome. 
Probability Experiment: 

Roll a six-sided die. 
Sample Space: 

{1, 2,3,4, 5,6} 
Event: 

Roll an even number, 

{2, 4, 6}. 
Outcome: 

Roll a 2, {2}. 
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Basic Concepts of Probability and Counting 


Probability Experiments » The Fundamental Counting Principle 
» Types of Probability » Complementary Events > Probability Applications 


> PROBABILITY EXPERIMENTS 


When weather forecasters say that there is a 90% chance of rain or a physician 
says there is a 35% chance for a successful surgery, they are stating the likelihood, 
or probability, that a specific event will occur. Decisions such as “should you 
go golfing” or “should you proceed with surgery” are often based on these 
probabilities. In the previous chapter, you learned about the role of the 
descriptive branch of statistics. Because probability is the foundation of 
inferential statistics, it is necessary to learn about probability before proceeding 
to the second branch—inferential statistics. 


DEFINITION 


A probability experiment is an action, or trial, through which specific results 
(counts, measurements, or responses) are obtained. The result of a single trial 
in a probability experiment is an outcome. The set of all possible outcomes of 
a probability experiment is the sample space. An event is a subset of the 
sample space. It may consist of one or more outcomes. 


EXAMPLE 1 


> Identifying the Sample Space of a Probability Experiment 


A probability experiment consists of tossing a coin and then rolling a six-sided 
die. Determine the number of outcomes and identify the sample space. 


> Solution 


There are two possible outcomes when tossing a coin: a head (H) or a tail (T). 
For each of these, there are six possible outcomes when rolling a die: 1, 2, 3, 4, 
5, or 6. A tree diagram gives a visual display of the outcomes of a probability 
experiment by using branches that originate from a starting point. It can be 
used to find the number of possible outcomes in a sample space as well as 
individual outcomes. 


Tree Diagram for Coin and Die Experiment 


H | T 
| | 


eegeemeaees 


1 
1 H2 H3 H4 H5 H6- Tl T2 T3 T4 T5 T6 


From the tree diagram, you can see that the sample space has 12 outcomes. 


{H1, H2, H3, H4, H5, H6,T1,T2,T3,T4, TS, T6} 
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SURVEY 


Does your favorite 
team’s win or pa 
affect your mood’ 


Check one 


a Yes 
at No 


eal Not sure 


Source: Rasmussen 


esponse: 
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> Try It Yourself 1 


For each probability experiment, determine the number of outcomes and 
identify the sample space. 


1. A probability experiment consists of recording a response to the survey 
statement at the left and the gender of the respondent. 

2. A probability experiment consists of recording a response to the survey 
statement at the left and the geographic location (Northeast, South, 
Midwest, West) of the respondent. 


a. Start a tree diagram by forming a branch for each possible response to the 
survey. 
b. At the end of each survey response branch, draw a new branch for each 
possible outcome. 
. Find the number of outcomes in the sample space. 
d. List the sample space. Answer: Page A34 


fe) 


In the rest of this chapter, you will learn how to calculate the probability or 
likelihood of an event. Events are often represented by uppercase letters, such 
as A, B, and C. An event that consists of a single outcome is called a 
simple event. In Example 1, the event “tossing heads and rolling a 3” is a simple 
event and can be represented as A = {H3}. In contrast, the event “tossing heads 
and rolling an even number” is not simple because it consists of three possible 
outcomes B = {H2, H4, H6}. 


EXAMPLE 2 


> Identifying Simple Events 
Determine the number of outcomes in each event. Then decide whether each 
event is simple or not. Explain your reasoning. 


1. For quality control, you randomly select a machine part from a batch that 
has been manufactured that day. Event A is selecting a specific defective 
machine part. 


2. You roll a six-sided die. Event B is rolling at least a 4. 


> Solution 


1. Event A has only one outcome: choosing the specific defective machine 
part. So, the event is a simple event. 


2. Event B has three outcomes: rolling a 4, a 5, or a 6. Because the event has 
more than one outcome, it is not simple. 


> Try It Yourself 2 


You ask for a student’s age at his or her last birthday. Determine the number 
of outcomes in each event. Then decide whether each event is simple or not. 
Explain your reasoning. 


1. Event C: The student’s age is between 18 and 23, inclusive. 
2. Event D: The student’s age is 20. 


a. Determine the number of outcomes in the event. 
b. State whether the event is simple or not. Explain your reasoning. 
Answer: Page A34 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


130 CHAPTER 3 PROBABILITY 


> THE FUNDAMENTAL COUNTING PRINCIPLE 


In some cases, an event can occur in so many different ways that it is not 
practical to write out all the outcomes. When this occurs, you can rely on the 
Fundamental Counting Principle. The Fundamental Counting Principle can be 
used to find the number of ways two or more events can occur in sequence. 


THE FUNDAMENTAL COUNTING PRINCIPLE 


If one event can occur in m ways and a second event can occur in n ways, 
the number of ways the two events can occur in sequence is m:n. This rule 
can be extended to any number of events occurring in sequence. 


In words, the number of ways that events can occur in sequence is found by 
multiplying the number of ways one event can occur by the number of ways the 
other event(s) can occur. 


EXAMPLE 3 


» Using the Fundamental Counting Principle 
You are purchasing a new car. The possible manufacturers, car sizes, and colors 


are listed. 
Manufacturer: Ford, GM, Honda 
Car size: compact, midsize 
Color: white (W), red (R), black (B), green (G) 


How many different ways can you select one manufacturer, one car size, and 
one color? Use a tree diagram to check your result. 
> Solution 


There are three choices of manufacturers, two choices of car sizes, and four 
choices of colors. Using the Fundamental Counting Principle, you can 
conclude that the number of ways to select one manufacturer, one car size, 
and one color is 


3°2:4 = 24 ways. 


Using a tree diagram, you can see why there are 24 options. 


Tree Diagram for Car Selections 


(> Ford ‘GM “> Honda 


“> compact @ midsize ‘compact ©) midsize (compact @ midsize 


> Try It Yourself 3 
Your choices now include a Toyota and a tan car. How many different ways can 
you select one manufacturer, one car size, and one color? Use a tree diagram 
to check your result. 


a. Find the number of ways each event can occur. 
b. Use the Fundamental Counting Principle. 
c. Use a tree diagram to check your result. Answer: Page A35 
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EXAMPLE 4 


» Using the Fundamental Counting Principle 


The access code for a car’s security system consists of four digits. Each digit can 
be any number from 0 through 9. 


Access Code 
cS oS 


Ist 2nd 3rd 4th 
digit digit digit digit 


How many access codes are possible if 
1. each digit can be used only once and not repeated? 
2. each digit can be repeated? 


3. each digit can be repeated but the first digit cannot be 0 or 1? 


> Solution 


1. Because each digit can be used only once, there are 10 choices for the first 
digit, 9 choices left for the second digit, 8 choices left for the third digit, and 
7 choices left for the fourth digit. Using the Fundamental Counting 
Principle, you can conclude that there are 


10:9°8+7 = 5040 
possible access codes. 


2. Because each digit can be repeated, there are 10 choices for each of the four 
digits. So, there are 


10- 10-10-10 = 10* 
= 10,000 
possible access codes. 


3. Because the first digit cannot be 0 or 1, there are 8 choices for the first digit. 
Then there are 10 choices for each of the other three digits. So, there are 


8-10-10-10 = 8000 


possible access codes. 


> Try It Yourself 4 

How many license plates can you make if a license plate consists of 

1. six (out of 26) alphabetical letters each of which can be repeated? 

2. six (out of 26) alphabetical letters each of which cannot be repeated? 


3. six (out of 26) alphabetical letters each of which can be repeated but the 
first letter cannot be A, B, C, or D? 


a. Identify each event and the number of ways each event can occur. 
b. Use the Fundamental Counting Principle. Answer: Page A35 
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STUDY TIP 


Probabilities can be written as 
fractions, decimals, or percents. 
In Example 5, the probabilities 
are written as fractions and 
decimals, rounded when 
necessary to three places. 

This round-off rule will t 
be used throughout ’ 
the text. 


Standard Deck of Playing Cards 


Hearts Diamonds Spades — Clubs 
AW A¢ Aa A & 
KY K¢ K&é K & 
QV Q¢ Qa Qe 
Jv J¢ Ja J & 
104 104 104 10e% 
9” 9¢ 9a 9 fe 
8 8 ¢ 8a 8 & 
7” 7¢ 1a 7 & 
6¥ 6¢ 64 6 & 
54 54 54 5 & 
4 4¢ 4a 4 oe 
39 34 34 3 & 
24 2¢ 2a 2 & 


PROBABILITY 
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>» TYPES OF PROBABILITY 


The method you will use to calculate a probability depends on the type of 
probability. There are three types of probability: classical probability, empirical 
probability, and subjective probability. The probability that event EF will occur is 
written as P(E) and is read “the probability of event E.” 


DEFINITION 


Classical (or theoretical) probability is used when each outcome in a sample 
space is equally likely to occur. The classical probability for an event E is 
given by 


Number of outcomes in event E 
Total number of outcomes in sample space © 


P(E) 


EXAMPLE 5 


» Finding Classical Probabilities 
You roll a six-sided die. Find the probability of each event. 


1. Event A: rolling a 3 
2. Event B: rolling a7 


3. Event C: rolling a number less than 5 


> Solution 

When a six-sided die is rolled, the sample space consists of six outcomes: 
ae 2, 35 4, 5, 6}. 

1. There is one outcome in event A = {3}. So, 


P(rolling a3) = ; ~ 0.167. 


2. Because 7 is not in the sample space, there are no outcomes in event B. So, 


P(rolling a7) = , = 0. 


3. There are four outcomes in event C = {1,2,3,4}. So, 


P(rolling a number less than 5) = ; = ; = 0.667. 


> Try It Yourself 5 
You select a card from a standard deck. Find the probability of each event. 


1. Event D: Selecting a nine of clubs 
2. Event E: Selecting a heart 
3. Event F: Selecting a diamond, heart, club, or spade 


. Identify the total number of outcomes in the sample space. 
. Find the number of outcomes in the event. 
c. Use the classical probability formula. 


ae) 


Answer: Page A35 
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When an experiment is repeated many times, regular patterns are formed. 
These patterns make it possible to find empirical probability. Empirical 
probability can be used even if each outcome of an event is not equally likely 


to occur. 
DEFINITION 

It seems as if no matter how Empirical (or statistical) probability is based on observations obtained from 
strange an event is, somebody probability experiments. The empirical probability of an event F is the relative 
wants to know the probability frequency of event E. 
that it will occur. The following 
table lists the probabilities that Frequency of event E 
some intriguing events will PGA) = Total frequency 


happen. (Adapted from Life: The Odds) 


Being audited 


| by the IRS 
vanes EXAMPLE 6 


New York Times 


f 
n 


best seller 0.0045 > Finding Empirical Probabilities 

Winning an A company is conducting a telephone survey of randomly selected individuals 

Academy Award 0.000087 to get their overall impressions of the past decade (2000s). So far, 1504 people 
; have been surveyed. The frequency distribution shows the results. What is the 

Having your probability that the next person surveyed has a positive overall impression of 


identity stolen 0.5% 
| Spotting a UFO 0.0000003 


the 2000s? (Adapted from Princeton Survey Research Associates International) 


Which of these events is most 


likely to occur? Least likely? Positive 406 
Negative 752 
Neither 316 
Don’t know 30 
df = 1504 


> Solution 

The event is a response of “positive.” The frequency of this event is 406. 
Because the total of the frequencies is 1504, the empirical probability of the 
next person having a positive overall impression of the 2000s is 


®) To explore this topic further, i _ 406 
~ see Activity 3.1 on page 144. P(positive) = 1504 
=~ 0.270. 


> Try It Yourself 6 

An insurance company determines that in every 100 claims, 4 are fraudulent. 
What is the probability that the next claim the company processes will be 
fraudulent? 


a. Identify the event. Find the frequency of the event. 
b. Find the total frequency for the experiment. 
c. Find the empirical probability of the event. Answer: Page A35 
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CHAPTER 3 


PROBABILITY 


Proportion that are heads 


Probability of Tossing a Head 


A 
1.0-- 


Number of tosses 


65 and over 


15 to 24 
25 to 34 
35 to 44 
45 to 54 
55 to 64 


54 
366 
233 
180 
125 

42 

Sf = 1000 
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As you increase the number of times a probability experiment is repeated, 
the empirical probability (relative frequency) of an event approaches the 
theoretical probability of the event. This is known as the law of large numbers. 


LAW OF LARGE NUMBERS 


As an experiment is repeated over and over, the empirical probability of an 
event approaches the theoretical (actual) probability of the event. 


As an example of this law, suppose you want to determine the probability of 
tossing a head with a fair coin. If you toss the coin 10 times and get only 
3 heads, you obtain an empirical probability of +. Because you tossed the coin 
only a few times, your empirical probability is not representative of the 
theoretical probability, which is 5. If, however, you toss the coin several thousand 
times, then the law of large numbers tells you that the empirical probability will 
be very close to the theoretical or actual probability. 

The scatter plot at the left shows the results of simulating a coin toss 150 
times. Notice that, as the number of tosses increases, the probability of tossing a 
head gets closer and closer to the theoretical probability of 0.5. 


EXAMPLE 7 


» Using Frequency Distributions to Find Probabilities 

You survey a sample of 1000 employees at a company and record the age of 
each. The results are shown in the frequency distribution at the left. If you 
randomly select another employee, what is the probability that the employee 
will be between 25 and 34 years old? 


> Solution 


The event is selecting an employee who is between 25 and 34 years old. The 
frequency of this event is 366. Because the total of the frequencies is 1000, the 
empirical probability of selecting an employee between the ages of 25 and 34 
years old is 


366 
1000 
= 0.366. 


P(age 25 to 34) = 


> Try It Yourself 7 


Find the probability that an employee chosen at random will be between 15 
and 24 years old. 


a. Find the frequency of the event. 
b. Find the total of the frequencies. 


c. Find the empirical probability of the event. Answer: Page A35 


The third type of probability is subjective probability. Subjective probabilities 
result from intuition, educated guesses, and estimates. For instance, given a 
patient’s health and extent of injuries, a doctor may feel that the patient has a 
90% chance of a full recovery. Or a business analyst may predict that the chance 
of the employees of a certain company going on strike is 0.25. 
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EXAMPLE 8 


> Classifying Types of Probability 


Classify each statement as an example of classical probability, empirical 
probability, or subjective probability. Explain your reasoning. 


1. The probability that you will get the flu this year is 0.1. 


2. The probability that a voter chosen at random will be younger than 35 years 
old is 0.3. 


3. The probability of winning a 1000-ticket raffle with one ticket is a0: 


> Solution 


1. This probability is most likely based on an educated guess. It is an example 
of subjective probability. 


2. This statement is most likely based on a survey of a sample of voters, so it 
is an example of empirical probability. 


3. Because you know the number of outcomes and each is equally likely, this 
is an example of classical probability. 


> Try It Yourself 8 


Based on previous counts, the probability of a salmon successfully passing 
through a dam on the Columbia River is 0.85. Is this statement an example of 
classical probability, empirical probability, or subjective probability? (Source: 
Army Corps of Engineers) 


a. Identify the event. 

b. Decide whether the probability is determined by knowing all possible 
outcomes, whether the probability is estimated from the results of an 
experiment, or whether the probability is an educated guess. 

ce. Make a conclusion. Answer: Page A35 


A probability cannot be negative or greater than 1. So, the probability of an 
event E is between 0 and 1, inclusive, as stated in the following rule. 


RANGE OF PROBABILITIES RULE 


The probability of an event E is between 0 and 1, inclusive. That is, 


@ = JPY) S i 


If the probability of an event is 1, the event is certain to occur. If the 
probability of an event is 0, the event is impossible. A probability of 0.5 indicates 
that an event has an even chance of occurring. 

The following graph shows the possible range of probabilities and their 
meanings. 


Impossible Unlikely Even chance Likely Certain 
[ | | 
0 0.5 1 


An event that occurs with a probability of 0.05 or less is typically considered 
unusual. Unusual events are highly unlikely to occur. Later in this course you will 
identify unusual events when studying inferential statistics. 
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The area of the rectangle 
represents the total probability 

of the sample space (1 = 100%). 
The area of the circle represents 
the probability of event £, and the 
area outside the circle represents 
the probability of the complement 
of event E. 


PROBABILITY 
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>» COMPLEMENTARY EVENTS 


The sum of the probabilities of all outcomes in a sample space is 1 or 100%. An 
important result of this fact is that if you know the probability of an event E, you 
can find the probability of the complement of event E. 


DEFINITION 


The complement of event E is the set of all outcomes in a sample space that 
are not included in event E. The complement of event E is denoted by E’ and 
is read as “FE prime.” 


For instance, if you roll a die and let E be the event “the number is at least 
5,” then the complement of EF is the event “the number is less than 5.” In symbols, 
E = {5,6} and E’ = {1,2,3,4}. 

Using the definition of the complement of an event and the fact that the 
sum of the probabilities of all outcomes is 1, you can determine the following 
formulas. 


P(E)+P(E')=1 P(E)=1-P(E') P(E’) =1- P(E) 


The Venn diagram at the left illustrates the relationship between the sample 
space, an event FE, and its complement E’. 


EXAMPLE 9 


> Finding the Probability of the Complement of an Event 

Use the frequency distribution in Example 7 to find the probability of 
randomly choosing an employee who is not between 25 and 34 years old. 

> Solution 


From Example 7, you know that 


366 
1000 


= 0.366. 


P(age 25 to 34) = 


So, the probability that an employee is not between 25 and 34 years old is 


_ 366 
1000 


634 
1000 


0.634. 


P(age is not 25 to 34) = 1 


> Try It Yourself 9 


Use the frequency distribution in Example 7 to find the probability of 
randomly choosing an employee who is not between 45 and 54 years old. 


a. Find the probability of randomly choosing an employee who is between 
45 and 54 years old. 
b. Subtract the resulting probability from 1. 


c. State the probability as a fraction and as a decimal. Answer: Page A35 
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> PROBABILITY APPLICATIONS 


EXAMPLE 10 


> Using a Tree Diagram 


A probability experiment consists of tossing a coin and spinning the spinner 
shown at the left. The spinner is equally likely to land on each number. Use a 


(SW 
tree diagram to find the probability of each event. 
Sale, 1. Event A: tossing a tail and spinning an odd number 


2. Event B: tossing a head or spinning a number greater than 3 


> Solution From the tree diagram at the left, you can see that there are 16 


outcomes. 
Tree Diagram for Coin and 1. There are four outcomes in event A = {T1,T3,T5,T7}. So, 
Spinner Experiment 4 1 
P(tossing a tail and spinning an odd number) = — = — = 0.25. 
@— Hi 16 «4 
| 9)» H2 2. There are 13 outcomes in event B = {H1, H2, H3, H4, H5, H6, H7, H8,T4, 
L 3+ T5,T6,T7,T8}. So, 
| @y—_> 4 . or 13 
H— P(tossing a head or spinning a number greater than 3) = 16 = 0.813. 
-- 5 —> H5 
I 6 —> H6 > Try It Yourself 10 
t- 7 —> H7 Find the probability of tossing a tail and spinning a number less than 6. 
| as a. Find the number of outcomes in the event. 
—@—> T1 b. Find the probability of the event. Answer: Page A35 
-- 2 —> T2 
-- 3 —> T3 
P< EXAMPLE 11 
4p = 

Oo TS > Using the Fundamental Counting Principle 
oO —~ To Your college identification number consists of eight digits. Each digit can be 
eB T7 0 through 9 and each digit can be repeated. What is the probability of getting 
Lg—~ Ts your college identification number when randomly generating eight digits? 


> Solution Because each digit can be repeated, there are 10 choices for 
each of the 8 digits. So, using the Fundamental Counting Principle, there are 
10-10-10-10-10-10-10-10 = 10% = 100,000,000 possible identification 
numbers. But only one of those numbers corresponds to your college 
identification number. So, the probability of randomly generating 8 digits and 
getting your college identification number is 1/100,000,000. 


> Try It Yourself 11 


Your college identification number consists of nine digits. The first two digits 
of each number will be the last two digits of the year you are scheduled to 
graduate. The other digits can be any number from 0 through 9, and each digit 
can be repeated. What is the probability of getting your college identification 
number when randomly generating the other seven digits? 


a. Find the total number of possible identification numbers. Assume that you 
are scheduled to graduate in 2015. 
b. Find the probability of randomly generating your identification number. 
Answer: Page A35 
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ERD EXERCISES 


S: 


FOR EXTRA HELP; 


7 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What is the difference between an outcome and an event? 


2. Determine which of the following numbers could not represent the 
probability of an event. Explain your reasoning. 


(a) 33.3%  (b) -15 (c) 0.0002 (d)0 (ce) 8 


3. Explain why the following statement is incorrect: The probability of rain 
tomorrow is 150%. 


4. When you use the Fundamental Counting Principle, what are you counting? 
5. Use your own words to describe the law of large numbers. Give an example. 


6. List the three formulas that can be used to describe complementary events. 


True or False? Jn Exercises 7-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


7. If you roll a six-sided die six times, you will roll an even number at least once. 


8. You toss a fair coin nine times and it lands tails up each time. The probability 
it will land heads up on the tenth flip is greater than 0.5. 


9. A probability of dh indicates an unusual event. 


10. If an event is almost certain to happen, its complement will be an unusual 
event. 


Matching Probabilities Jn Exercises 11-14, match the event with its 
probability. 


(a) 0.95 (b) 0.05 (c) 0.25. (d) O 


11. You toss a coin and randomly select a number from 0 to 9. What is the 
probability of getting tails and selecting a 3? 


12. A random number generator is used to select a number from 1 to 100. What 
is the probability of selecting the number 153? 


13. A game show contestant must randomly select a door. One door doubles 
her money while the other three doors leave her with no winnings. What is 
the probability she selects the door that doubles her money? 


14. Five of the 100 digital video recorders (DVRs) in an inventory are known 
to be defective. What is the probability you randomly select an item that is 
not defective? 


M@ USING AND INTERPRETING CONCEPTS 


Identifying a Sample Space = Jn Exercises 15-20, identify the sample space of 
the probability experiment and determine the number of outcomes in the sample 
space. Draw a tree diagram if it is appropriate. 


15. Guessing the initial of a student’s middle name 
16. Guessing a student’s letter grade (A, B, C, D, F) in a class 


17. Drawing one card from a standard deck of cards 
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18. 
19. 


20. 


Tossing three coins 


Determining a person’s blood type (A, B, AB, O) and Rh-factor (positive, 
negative) 


Rolling a pair of six-sided dice 


Recognizing Simple Events Jn Exercises 21-24, determine the number of 
outcomes in each event. Then decide whether the event is a simple event or not. 
Explain your reasoning. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


A computer is used to randomly select a number between 1 and 4000. 
Event A is selecting 253. 


A computer is used to randomly select a number between 1 and 4000. 
Event B is selecting a number less than 500. 


You randomly select one card from a standard deck. Event A is selecting 
an ace. 


You randomly select one card from a standard deck. Event B is selecting a 
ten of diamonds. 


Job Openings A software company is hiring for two positions: a software 
development engineer and a sales operations manager. How many ways can 
these positions be filled if there are 12 people applying for the engineering 
position and 17 people applying for the managerial position? 


Menu A restaurant offers a $12 dinner special that has 5 choices for an 
appetizer, 10 choices for entrées, and 4 choices for dessert. How many differ- 
ent meals are available if you select an appetizer, an entrée, and a dessert? 


Realty A realtor uses a lock box to store the keys for a house that is for sale. 
The access code for the lock box consists of four digits. The first digit cannot be 
zero and the last digit must be even. How many different codes are available? 


True or False Quiz Assuming that no questions are left unanswered, in how 
many ways can a six-question true-false quiz be answered? 


Classical Probabilities In Exercises 29-34, a probability experiment consists 
of rolling a 12-sided die. Find the probability of each event. 


29. 
30. 
31. 
32. 
33. 
34. 


Event A: rolling a2 

Event B: rolling a 10 

Event C: rolling a number greater than 4 
Event D: rolling an even number 

Event E: rolling a prime number 


Event F: rolling a number divisible by 5 


Classifying Types of Probability In Exercises 35 and 36, classify the 
statement as an example of classical probability, empirical probability, or 
subjective probability. Explain your reasoning. 


35. 


36. 


According to company records, the probability that a washing machine will 
need repairs during a six-year period is 0.10. 


The probability of choosing 6 numbers from 1 to 40 that match the 6 numbers 
drawn by a state lottery is 1/3,838,380 ~ 0.00000026. 
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FIGURE FOR EXERCISES 41-44 


Day1 Day2 Day3 


—i# SSS 
+ 
L_@ SSR 
+ 
—i%# SRS 
é 
L_@ SRR 
—i## RSS 
+ 
L_@ RSR 
r’ 
——i% RRS 
r’ 
L_@ RRR 


FIGURE FOR EXERCISES 47-50 
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Finding Probabilities In Exercises 37-40, consider a company that selects 
employees for random drug tests. The company uses a computer to randomly select 
employee numbers that range from I to 6296. 


37. Find the probability of selecting a number less than 1000. 

38. Find the probability of selecting a number greater than 1000. 

39. Find the probability of selecting a number divisible by 1000. 

40. Find the probability of selecting a number that is not divisible by 1000. 


Probability Experiment Jn Exercises 41-44, a probability experiment 
consists of rolling a six-sided die and spinning the spinner shown at the left. 
The spinner is equally likely to land on each color. Use a tree diagram to find the 
probability of each event. Then tell whether the event can be considered unusual. 


41. Event A: rolling a 5 and the spinner landing on blue 
42. Event B: rolling an odd number and the spinner landing on green 
43. Event C: rolling a number less than 6 and the spinner landing on yellow 


44. Event D: not rolling a number less than 6 and the spinner landing on yellow 


45. Security System The access code for a garage door consists of three digits. 
Each digit can be any number from 0 through 9, and each digit can be repeated. 
(a) Find the number of possible access codes. 


(b) What is the probability of randomly selecting the correct access code on 
the first try? 

(c) What is the probability of not selecting the correct access code on the 
first try? 


46. Security System An access code consists of a letter followed by four digits. 
Any letter can be used, the first digit cannot be 0, and the last digit must be 
even. 

(a) Find the number of possible access codes. 


(b) What is the probability of randomly selecting the correct access code on 
the first try? 


(c) What is the probability of not selecting the correct access code on the 
first try? 


Wet or Dry? You are planning a three-day trip to Seattle, Washington in 
October. In Exercises 47-50, use the tree diagram shown at the left to answer each 
question. 


47. List the sample space. 
48. List the outcome(s) of the event “It rains all three days.” 
49, List the outcome(s) of the event “It rains on exactly one day.” 


50. List the outcome(s) of the event “It rains on at least one day.” 


51. Sunny and Rainy Days You are planning a four-day trip to Seattle, 
Washington in October. 
(a) Make a sunny day/rainy day tree diagram for your trip. 
(b) List the sample space. 


(c) List the outcome(s) of the event “It rains on exactly one day.” 
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52. Machine Part Suppliers Your company buys machine parts from three 
different suppliers. Make a tree diagram that shows the three suppliers and 
whether the parts they supply are defective. 


Graphical Analysis Jn Exercises 53 and 54, use the diagram to answer the 
question. 


53. What is the probability that a registered voter in Virginia voted in the 2009 
gubernatorial election? (Source: Commonwealth of Virginia State Board of 


Elections) 
About 
BD 5 051 50) ofthe About About 
voted in the 2009| “*".’ 65,028,953 57,930,888 
pera registered voters 

Virginia Ta Wineiaie voted voted for 
gubernatorial did ee apes Democrat another party 

election 

FIGURE FOR EXERCISE 53 FIGURE FOR EXERCISE 54 


54. What is the probability that a voter chosen at random did not vote for 
a Democratic representative in the 2008 election? (Source: Federal Election 
Commission) 


Using a Frequency Distribution to Find Probabilities In Exercises 
55-58, use the frequency distribution at the left, which shows the number of 
American voters (in millions) according to age, to find the probability that a voter 


18 to 20 58 chosen at random is in the given age range. (Source: U.S. Census Bureau) 
21 to 24 9.3 
‘ 55. between 18 and 20 years old 56. between 35 and 44 years old 
25 to 34 22.7 
35 to 44 25.4 57. not between 21 and 24 years old 58. not between 45 and 64 years old 
45 to 64 54.9 : : wg igi 
be eaacas a4 Using a Bar Graph to Find Probabilities In Exercises 59-62, use the 
v : following bar graph, which shows the highest level of education received by 
TABLE FOR EXERCISES 55-58 employees of a company. 


Level of Education 


gy 35+ 34 

> 30+ 

jo} 

25-4 25 23 

eee 

2 20 

2 is 

oO 

2 107 

BST 3 4 2 

| ame = ee 
3 2 2 2 38 5 
g 2 BS) 3 26 = 
3 E 2 3 38 ° 
a = 3 a as 
Q 2 e 


Highest level of education 


Find the probability that the highest level of education for an employee chosen at 


random is 
59. a doctorate. 60. an associate’s degree. 
61. a master’s degree. 62. a high school diploma. 


63. Can any of the events in Exercises 55-58 be considered unusual? Explain. 


64. Can any of the events in Exercises 59-62 be considered unusual? Explain. 
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Parents 
Ssmm and SsMm 


“Sm SSMm_— SSmm 
SSMm | SSmm 
SsMm Ssmm 
SsMm Ssmm 


SsMm Ssmm 
SsMm Ssmm 
ssMm ssmm 
ssMm ssmm 


TABLE FOR EXERCISE 66 


Workers (in thousands) by 
Industry for the U.S. 


Agriculture, 

forestry, fishing, 

and hunting 
2168 


04.) Manufacturing 


Mining and 
construction 
11,793 


FIGURE FOR EXERCISES 67-70 


PROBABILITY 


65. 
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Genetics A Punnett square is a diagram that shows all possible gene 
combinations in a cross of parents whose genes are known. When two pink 
snapdragon flowers (RW) are crossed, there are four equally likely possible 
outcomes for the genetic makeup of the offspring: red (RR), pink (RW), 
pink (WR), and white (WW), as shown in the Punnett square. If two pink 
snapdragons are crossed, what is the probability that the offspring will be 
(a) pink, (b) red, and (c) white? 


. Genetics There are six basic types of coloring in registered collies: sable 


(SSmm), tricolor (ssmm), trifactored sable (Ssmm), blue merle (ssMm), sable 
merle (SSMm), and trifactored sable merle (SsMm). The Punnett square at 
the left shows the possible coloring of the offspring of a trifactored sable 
merle collie and a trifactored sable collie. What is the probability that the 
offspring will have the same coloring as one of its parents? 


Using a Pie Chart to Find Probabilities §/n Exercises 67-70, use the pie 
chart at the left, which shows the number of workers (in thousands) by industry for 
the United States. (Source: U.S. Bureau of Labor Statistics) 


67. 


68. 


69. 


Find the probability that a worker chosen at random was employed in the 
services industry. 


Find the probability that a worker chosen at random was employed in the 
manufacturing industry. 


Find the probability that a worker chosen at random was not employed in 
the services industry. 


. Find the probability that a worker chosen at random was not employed in 


the agriculture, forestry, fishing, and hunting industry. 


. College Football A stem-and-leaf plot for the number of touchdowns 


scored by all NCAA Division I Football Bowl Subdivision teams is shown. 
If a team is selected at random, find the probability the team scored (a) at 
least 51 touchdowns, (b) between 20 and 30 touchdowns, inclusive, and 
(c) more than 69 touchdowns. Are any of these events unusual? Explain. 
(Source: NCAA) 


889 Key: 1|8 = 18 

113445566778899 
01123333344444555555777788999 
000012222334444444455555556666677788888999 
000011222245556679 

00122345689 

67 


NNN FS WN FP 
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72. Individual Stock Price An individual stock is selected at random from the 
portfolio represented by the box-and-whisker plot shown. Find the 
probability that the stock price is (a) less than $21, (b) between $21 and $50, 
and (c) $30 or more. 


Stock price (in dollars) 


Writing Jn Exercises 73 and 74, write a statement that represents the 
complement of the given probability. 


73. The probability of randomly choosing a tea drinker who has a college degree 
(Assume that you are choosing from the population of all tea drinkers.) 


74. The probability of randomly choosing a smoker whose mother also smoked 
(Assume that you are choosing from the population of all smokers.) 


M@ EXTENDING CONCEPTS 


75. Rolling a Pair of Dice You roll a pair of six-sided dice and record the sum. 


(a) List all of the possible sums and determine the probability of rolling each 
sum. 


(b) Use a technology tool to simulate rolling a pair of dice and recording the 
sum 100 times. Make a tally of the 100 sums and use these results to list 
the probability of rolling each sum. 

(c) Compare the probabilities in part (a) with the probabilities in part (b). 
Explain any similarities or differences. 


Odds In Exercises 76-81, use the following information. The chances of 
winning are often written in terms of odds rather than probabilities. The odds 
of winning is the ratio of the number of successful outcomes to the number of 
unsuccessful outcomes. The odds of losing is the ratio of the number of 
unsuccessful outcomes to the number of successful outcomes. For example, if the 
number of successful outcomes is 2 and the number of unsuccessful outcomes is 3, 
the odds of winning are 2:3 (read “2 to 3”) or é. 


76. A beverage company puts game pieces under the caps of its drinks and 
claims that one in six game pieces wins a prize. The official rules of the 
contest state that the odds of winning a prize are 1:6. Is the claim “one in six 
game pieces wins a prize” correct? Why or why not? 


77. The probability of winning an instant prize game is + The odds of winning 
a different instant prize game are 1:10. If you want the best chance of 
winning, which game should you play? Explain your reasoning. 


78. The odds of an event occurring are 4:5. Find (a) the probability that the 
event will occur and (b) the probability that the event will not occur. 


79. A card is picked at random from a standard deck of 52 playing cards. Find 
the odds that it is a spade. 


80. A card is picked at random from a standard deck of 52 playing cards. Find 
the odds that it is not a spade. 


81. The odds of winning an event A are p:q. Show that the probability of event 


1 oe P 
A is given by P(A) = . 
i es) Rae 
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Simulating the Stock Market 


The simulating the stock market applet allows you to investigate the probability 
APPLET that the stock market will go up on any given day. The plot at the top left corner 
shows the probability associated with each outcome. In this case, the market has a 
50% chance of going up on any given day. When SIMULATE is clicked, outcomes 
for n days are simulated. The results of the simulations are shown in the frequency 
plot. If the animate option is checked, the display will show each outcome dropping 
into the frequency plot as the simulation runs. The individual outcomes are shown 
in the text field at the far right of the applet. The center plot shows in red the 
cumulative proportion of times that the market went up. The green line in the plot 
reflects the true probability of the market going up. As the experiment is conducted 
over and over, the cumulative proportion should converge to the true value. 


Probability Simulations: 


A 


0.4 
0.2 


0 


0.5 


Simulate [v 


m Explore 


Step 1 Specify a value for n. 

Step 2. Click SIMULATE four times. 
Step 3. Click RESET. 

Step 4 Specify another value for n. 
Step 5 Click SIMULATE. 


= Draw Conclusions 


APPLET 1. Run the simulation using n = 1 without clicking RESET. How many days did 
it take until there were three straight days on which the stock market went up? 
three straight days on which the stock market went down? 


2. Run the applet to simulate the stock market activity over the last 35 business 
days. Find the empirical probability that the market goes up on day 36. 
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Conditional Probability and the Multiplication Rule 


WHAT YOU SHOULD LEARN Conditional Probability > Independent and Dependent Events 
>» The Multiplication Rule 


> How to find the probability of >» CONDITIONAL PROBABILITY 


an event given that another 

event has occurred In this section, you will learn how to find the probability that two events occur in 
sequence. Before you can find this probability, however, you must know how to 
find conditional probabilities. 


4 


How to distinguish between 
independent and dependent 


events 

>» How to use the Multiplication DEFINITION 
Rule to find the probability A conditional probability is the probability of an event occurring, given that 
of two events occurring in another event has already occurred. The conditional probability of event B 
Sequence occurring, given that event A has occurred, is denoted by P(B|A) and is read 


as “probability of B, given A.” 


EXAMPLE 1 


» Finding Conditional Probabilities 


4 


How to use the Multiplication 
Rule to find conditional 
probabilities 


1. Two cards are selected in sequence from a standard deck. Find the 
probability that the second card is a queen, given that the first card is a king. 
(Assume that the king is not replaced.) 


2. The table at the left shows the results of a study in which researchers 
examined a child’s IQ and the presence of a specific gene in the child. Find 
the probability that a child has a high IQ, given that the child has the gene. 


> Solution 


1. Because the first card is a king and is not replaced, the remaining deck has 
51 cards, 4 of which are queens. So, 


P(B|A) = mi = 0.078. 


So, the probability that the second card is a queen, given that the first card 
is a king, is about 0.078. 


2. There are 72 children who have the gene. So, the sample space consists of 
these 72 children, as shown at the left. Of these, 33 have a high IQ. So, 


Sample Space 33, 
P(B| A) = 55 © 0.458. 
ae So, the probability that a child has a high IQ, given that the child has the 
33 gene, is about 0.458. 
si > Try It Yourself 1 
72 


1. Find the probability that a child does not have the gene. 
2. Find the probability that a child does not have the gene, given that the child 
has a normal IQ. 


a. Find the number of outcomes in the event and in the sample space. 
b. Divide the number of outcomes in the event by the number of outcomes in 
the sample space. Answer: Page A35 


Presented by: https://jafrilibrary.org 


146 CHAPTER 3 PROBABILITY 


Truman Collins, a probability 
and statistics enthusiast, wrote a 
program that finds the probability 
of landing on each square of 

a Monopoly board during a 
game. Collins explored various 
scenarios, including the effects 
of the Chance and Community 
Chest cards and the various 
ways of landing in or getting 
out of jail. Interestingly, Collins 
discovered that the length of 
each jail term affects the 
probabilities. 


Go 0.0310 0.0291 
Chance 0.0087 0.0082 
In Jail 0.0395 0.0946 
Free Parking | 0.0288 — 0.0283 
Park Place 0.0219 | 0.0206 
B&O RR 0.0307 0.0289 
Water Works 0.0281 0.0265 


Why do the probabilities depend 
on how long you stay in jail? 
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> INDEPENDENT AND DEPENDENT EVENTS 


In some experiments, one event does not affect the probability of another. 
For instance, if you roll a die and toss a coin, the outcome of the roll of the die 
does not affect the probability of the coin landing on heads. These two events are 
independent. The question of the independence of two or more events is impor- 
tant to researchers in fields such as marketing, medicine, and psychology. You can 
use conditional probabilities to determine whether events are independent. 


DEFINITION 


Two events are independent if the occurrence of one of the events does not 
affect the probability of the occurrence of the other event. Two events A and 
B are independent if 


PCB TAy = Pts ror it PAB) P(A 


Events that are not independent are dependent. 


To determine if A and B are independent, first calculate P(B), the 
probability of event B. Then calculate P(B| A), the probability of B, given A. 
If the values are equal, the events are independent. If P(B) # P(B|A), then A 
and B are dependent events. 


EXAMPLE 2 


> Classifying Events as Independent or Dependent 
Decide whether the events are independent or dependent. 


1. Selecting a king from a standard deck (A), not replacing it, and then 
selecting a queen from the deck (B) 


2. Tossing a coin and getting a head (A), and then rolling a six-sided die and 
obtaining a 6 (B) 


3. Driving over 85 miles per hour (A), and then getting in a car accident (B) 


> Solution 


1. P(B|A) = = and P(B) = &. The occurrence of A changes the probability 
of the occurrence of B, so the events are dependent. 


2. P(B|A) = % and P(B) = . The occurrence of A does not change the 
probability of the occurrence of B,so the events are independent. 


3. If you drive over 85 miles per hour, the chances of getting in a car accident 
are greatly increased, so these events are dependent. 


> Try It Yourself 2 
Decide whether the events are independent or dependent. 


1. Smoking a pack of cigarettes per day (A) and developing emphysema, a 
chronic lung disease (B) 
2. Exercising frequently (A) and having a 4.0 grade point average (B) 


a. Decide whether the occurrence of the first event affects the probability of 


the second event. 
b. State if the events are independent or dependent. Answer: Page A35 
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> THE MULTIPLICATION RULE 


To find the probability of two events occurring in sequence, you can use the 
Multiplication Rule. 


STUDY TIP THE MULTIPLICATION RULE FOR THE 


PROBABILITY OF A AND B 


Rule, The probability that two events A and B will occur in sequence is 


In words, to use the Multiplication 


1. find the probability that 
the first event occurs, 


2. find the probability 
that the second event 
occurs given that 
the first event has 
occurred, and 


P(Aand B) = P(A): P(B|A). 


If events A and B are independent, then the rule can be simplified to 
P(A and B) = P(A): P(B). This simplified rule can be extended to any 
number of independent events. 


EXAMPLE 3 


3. multiply these two 


probabilities. 
» Using the Multiplication Rule to Find Probabilities 
1. Two cards are selected, without replacing the first card, from a standard 
deck. Find the probability of selecting a king and then selecting a queen. 
2. A coin is tossed and a die is rolled. Find the probability of tossing a head 
and then rolling a 6. 
> Solution 
1. Because the first card is not replaced, the events are dependent. 
P(K and Q) = P(K)+P(Q|K) 
wees 
7388 41 
_ _16— 
2652 
= 0.006 
So, the probability of selecting a king and then a queen is about 0.006. 
2. The events are independent. 
P(H and 6) = P(#)- P(6) 
Sous 
2 6 
af 
12 
= 0.083 
So, the probability of tossing a head and then rolling a 6 is about 0.083. 
> Try It Yourself 3 
1. The probability that a salmon swims successfully through a dam is 0.85. Find 
the probability that two salmon swim successfully through the dam. 
2. Two cards are selected from a standard deck without replacement. Find the 
probability that they are both hearts. 
a. Decide if the events are independent or dependent. 
b. Use the Multiplication Rule to find the probability. Answer: Page A35 
Larson Texts, Inc. ¢ Final Pages ¢ Statistics 5e « Short Long 
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EXAMPLE 4 


> Using the Multiplication Rule to Find Probabilities 
The probability that a particular knee surgery is successful is 0.85. 


1. 
2. 
3. 


Find the probability that three knee surgeries are successful. 
Find the probability that none of the three knee surgeries are successful. 


Find the probability that at least one of the three knee surgeries is successful. 


> Solution 


1. 


The probability that each knee surgery is successful is 0.85. The chance of 
success for one surgery is independent of the chances for the other surgeries. 


P(three surgeries are successful) = (0.85) (0.85)(0.85) 
~ 0.614 


So, the probability that all three surgeries are successful is about 0.614. 


. Because the probability of success for one surgery is 0.85, the probability of 


failure for one surgery is 1 — 0.85 = 0.15. 
P(none of the three are successful) = (0.15)(0.15)(0.15) 
= 0.003 


So, the probability that none of the surgeries are successful is about 0.003. 
Because 0.003 is less than 0.05, this can be considered an unusual event. 


. The phrase “at least one” means one or more. The complement to the event 


“at least one is successful” is the event “none are successful.” Use the 
complement to find the probability. 


P(at least one is successful) = 1 — P(none are successful) 
1 — 0.003 
= 0.997. 


2 


So, the probability that at least one of the three surgeries is successful is 
about 0.997. 


> Try It Yourself 4 


The probability that a particular rotator cuff surgery is successful is 0.9. 
(Source: The Orthopedic Center of St. Louis) 


1. 
2. 


3. 


one.’ 


Find the probability that three rotator cuff surgeries are successful. 

Find the probability that none of the three rotator cuff surgeries are 
successful. 

Find the probability that at least one of the three rotator cuff surgeries is 
successful. 


. Decide whether to find the probability of the event or its complement. 
. Use the Multiplication Rule to find the probability. If necessary, use the 


complement. 


. Determine if the event is unusual. Explain. Answer: Page A35 


In Example 4, you were asked to find a probability using the phrase “at least 
’ Notice that it was easier to find the probability of its complement, “none,” 


and then subtract the probability of its complement from 1. 
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EXAMPLE 5 


Medical School 


U.S. medical school seniors 


Seniors matched 
with residency positions 


Seniors 
matched with 
one of their 
top three 

choices 


Jury Selection 


Jury selection pool 


Female 


Works ina 
health field 


» Using the Multiplication Rule to Find Probabilities 


More than 15,000 U.S. medical school seniors applied to residency programs 
in 2009. Of those, 93% were matched with residency positions. Eighty-two 
percent of the seniors matched with residency positions were matched with 
one of their top three choices. Medical students electronically rank the 
residency programs in their order of preference, and program directors across 
the United States do the same. The term “match” refers to the process 
whereby a student’s preference list and a program director’s preference list 
overlap, resulting in the placement of the student in a residency position. 
(Source: National Resident Matching Program) 


1. Find the probability that a randomly selected senior was matched with a 
residency position and it was one of the senior’s top three choices. 


2. Find the probability that a randomly selected senior who was matched with 
a residency position did not get matched with one of the senior’s top 
three choices. 


3. Would it be unusual for a randomly selected senior to be matched with a 
residency position and that it was one of the senior’s top three choices? 
> Solution 


Let A = {matched with residency position} and B = {matched with one of 
top three choices}. So, P(A) = 0.93 and P(B|A) = 0.82. 


1. The events are dependent. 
P(Aand B) = P(A): P(B|A) = (0.93) - (0.82) ~ 0.763 


So, the probability that a randomly selected senior was matched with one of 
the senior’s top three choices is about 0.763. 


2. To find this probability, use the complement. 
P(B'|A) = 1 — P(B|A) = 1 — 0.82 = 0.18 


So, the probability that a randomly selected senior was matched with a 
residency position that was not one of the senior’s top three choices is 0.18. 


3. It is not unusual because the probability of a senior being matched with a 
residency position that was one of the senior’s top three choices is about 
0.763, which is greater than 0.05. 


> Try It Yourself 5 


In a jury selection pool, 65% of the people are female. Of these 65%, one out 
of four works in a health field. 


1. Find the probability that a randomly selected person from the jury pool is 
female and works in a health field. 

2. Find the probability that a randomly selected person from the jury pool is 
female and does not work in a health field. 


a. Determine events A and B. 

b. Use the Multiplication Rule to write a formula to find the probability. If 
necessary, use the complement. 

c. Calculate the probability. Answer: Page A35 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 
i. 


What is the difference between independent and dependent events? 
List examples of 


(a) two events that are independent. 


(b) two events that are dependent. 


. What does the notation P(B|A) mean? 


. Explain how the complement can be used to find the probability of getting 


at least one item of a particular type. 


True or False? In Exercises 5 and 6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. 
6. 


If two events are independent, P(A|B) = P(B). 
If events A and B are dependent, then P(A and B) = P(A): P(B). 


Classifying Events Jn Exercises 7-12, decide whether the events are 
independent or dependent. Explain your reasoning. 


7. 


Selecting a king from a standard deck, replacing it, and then selecting a 
queen from the deck 


. Returning a rented movie after the due date and receiving a late fee 
. A father having hazel eyes and a daughter having hazel eyes 
. Not putting money in a parking meter and getting a parking ticket 


. Rolling a six-sided die and then rolling the die a second time so that the sum 


of the two rolls is five 


. A ball numbered from 1 through 52 is selected from a bin, replaced, and then 


a second numbered ball is selected from the bin. 


Classifying Events Based on Studies In Exercises 13-16, identify the two 
events described in the study. Do the results indicate that the events are independent 
or dependent? Explain your reasoning. 


13. 


A study found that people who suffer from moderate to severe sleep apnea 
are at increased risk of having high blood pressure. (Source: Journal of the 
American Medical Association) 


. Stress causes the body to produce higher amounts of acid, which can irritate 


already existing ulcers. But, stress does not cause stomach ulcers. (Source: 
Baylor College of Medicine) 


. Studies found that exposure to everyday sources of aluminum does not cause 


Alzheimer’s disease. (Source: Alzheimer’s Association) 


. According to researchers, diabetes is rare in societies in which obesity is rare. 


In societies in which obesity has been common for at least 20 years, diabetes 
is also common. (Source: American Diabetes Association) 
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M@ USING AND INTERPRETING CONCEPTS 


17. 


BRCA Gene In the general population, one woman in eight will develop 
breast cancer. Research has shown that approximately 1 woman in 600 carries 
a mutation of the BRCA gene. About 6 out of 10 women with this mutation 
develop breast cancer. (Adapted from Susan G. Komen Breast Cancer Foundation) 


(a) Find the probability that a randomly selected woman will develop breast 
cancer, given that she has a mutation of the BRCA gene. 


(b) Find the probability that a randomly selected woman will carry the 
mutation of the BRCA gene and will develop breast cancer. 

(c) Are the events “carrying this mutation” and “developing breast cancer” 
independent or dependent? Explain. 


Breast Cancer and the BRCA Gene What Do You Drive? 


Women 


Adults surveyed 


Women 
with Women who Adults Adults 
mutated develop breast wie dies who drive 


BRCA cancer Fords 


gene 


pickup 
trucks 


FIGURE FOR EXERCISE 17 FIGURE FOR EXERCISE 18 


18. 


19. 


Pickup Trucks In a survey, 510 adults were asked if they drive a pickup 
truck and if they drive a Ford. The results showed that one in six adults 
surveyed drives a pickup truck, and three in ten adults surveyed drive a Ford. 
Of the adults surveyed that drive Fords, two in nine drive a pickup truck. 


(a) Find the probability that a randomly selected adult drives a pickup truck, 
given that the adult drives a Ford. 


(b) Find the probability that a randomly selected adult drives a Ford and 
drives a pickup truck. 


(c) Are the events “driving a Ford” and “driving a pickup truck” independent 
or dependent? Explain. 


Summer Vacation The table shows the results of a survey in which 146 
families were asked if they own a computer and if they will be taking a 
summer vacation during the current year. 


(a) Find the probability that a randomly selected family is not taking a 
summer vacation this year. 


(b) Find the probability that a randomly selected family owns a computer. 

(c) Find the probability that a randomly selected family is taking a summer 
vacation this year, given that they own a computer. 

(d) Find the probability that a randomly selected family is taking a summer 
vacation this year and owns a computer. 

(e) Are the events “owning a computer” and “taking a summer vacation this 
year” independent or dependent events? Explain. 
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20. Nursing Majors The table shows the number of male and female students 
enrolled in nursing at the University of Oklahoma Health Sciences Center 
for a recent semester. (Source: University of Oklahoma Health Sciences Center 
Office of Institutional Research) 


151 1104 1255 
1016 1693 2709 


1167 2797 3964 


(a) Find the probability that a randomly selected student is a nursing major. 
(b) Find the probability that a randomly selected student is male. 


(c) Find the probability that a randomly selected student is a nursing major, 
given that the student is male. 


(d) Find the probability that a randomly selected student is a nursing major 
and male. 


(e) Are the events “being a male student” and “being a nursing major” 
independent or dependent events? Explain. 
Pregnancies 
ART cycles 


21. Assisted Reproductive Technology A study found that 37% of the assisted 
reproductive technology (ART) cycles resulted in pregnancies. Twenty-five 
percent of the ART pregnancies resulted in multiple births. (Source: National 
Center for Chronic Disease Prevention and Health Promotion) 


Pregnancies 


(a) Find the probability that a randomly selected ART cycle resulted in a 
pregnancy and produced a multiple birth. 

Multipl 

Rares (b) Find the probability that a randomly selected ART cycle that resulted in 


a pregnancy did not produce a multiple birth. 


(c) Would it be unusual for a randomly selected ART cycle to result in a 
FIGURE FOR EXERCISE 21 pregnancy and produce a multiple birth? Explain. 


22. Government According to a survey, 86% of adults in the United States 
think the U.S. government system is broken. Of these 86%, about 8 out of 
10 think the government can be fixed. (Adapted from CNN/Opinion Research 
Corporation) 


Government 
Adults in the U.S. 


‘Adults who think 
the U.S. government 
system is broken 


(a) Find the probability that a randomly selected adult thinks the U.S. 
government system is broken and thinks the government can be fixed. 


(b) Given that a randomly selected adult thinks the U.S. government system 
is broken, find the probability that he or she thinks the government 
cannot be fixed. 


Adults who 
think the 
government 


Cane xed (c) Would it be unusual for a randomly selected adult to think the US. 


government system is broken and think the government can be fixed? 
FIGURE FOR EXERCISE 22 Explain. 


23. Computers and Internet Access A study found that 81% of households 
in the United States have computers. Of those 81%, 92% have Internet 
access. Find the probability that a U.S. household selected at random has a 
computer and has Internet access. (Source: The Nielsen Company) 


24. Surviving Surgery A doctor gives a patient a 60% chance of surviving 
bypass surgery after a heart attack. If the patient survives the surgery, he has 
a 50% chance that the heart damage will heal. Find the probability that the 
patient survives surgery and the heart damage heals. 
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27. 


28. 
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People Who Can Wiggle Their Ears In a sample of 1000 people, 130 can wiggle 
their ears. Two unrelated people are selected at random without replacement. 
(a) Find the probability that both people can wiggle their ears. 

(b) Find the probability that neither person can wiggle his or her ears. 


(c) Find the probability that at least one of the two people can wiggle his or 
her ears. 


(d) Which of the events can be considered unusual? Explain. 

Batteries Sixteen batteries are tested to see if they last as long as the 
manufacturer claims. Four batteries fail the test. Two batteries are selected 
at random without replacement. 

(a) Find the probability that both batteries fail the test. 

(b) Find the probability that both batteries pass the test. 

(c) Find the probability that at least one battery fails the test. 

(d) Which of the events can be considered unusual? Explain. 

Emergency Savings The table shows the results of a survey in which 


142 male and 145 female workers ages 25 to 64 were asked if they had at least 
one month’s income set aside for emergencies. 


(a) Find the probability that a randomly selected worker has one month’s 
income or more set aside for emergencies. 


(b) Given that a randomly selected worker is a male, find the probability 
that the worker has less than one month’s income. 


(c) Given that a randomly selected worker has one month’s income or more, 
find the probability that the worker is a female. 


(d) Are the events “having less than one month’s income saved” and “being 
male” independent or dependent? Explain. 


Health Care for Dogs The table shows the results of a survey in which 
90 dog owners were asked how much they had spent in the last year for their 
dog’s health care, and whether their dogs were purebred or mixed breeds. 


19 21 40 
35 15 50 
54 36 90 


(a) Find the probability that $100 or more was spent on a randomly selected 
dog’s health care in the last year. 

(b) Given that a randomly selected dog owner spent less than $100, find the 
probability that the dog was a mixed breed. 

(c) Find the probability that a randomly selected dog owner spent $100 or 
more on health care and the dog was a mixed breed. 

(d) Are the events “spending $100 or more on health care” and “having a 
mixed breed dog” independent or dependent? Explain. 
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Blood Types The probability that a person in the United States has type 
B* blood is 9%. Five unrelated people in the United States are selected at 
random. (Source: American Association of Blood Banks) 

(a) Find the probability that all five have type B* blood. 

(b) Find the probability that none of the five have type B* blood. 

(c) Find the probability that at least one of the five has type B* blood. 


. Blood Types The probability that a person in the United States has type 


A* blood is 31%. Three unrelated people in the United States are selected at 
random. (Source: American Association of Blood Banks) 

(a) Find the probability that all three have type A* blood. 

(b) Find the probability that none of the three have type A* blood. 

(c) Find the probability that at least one of the three has type A® blood. 


. Guessing A multiple-choice quiz has five questions, each with four answer 


choices. Only one of the choices is correct. You have no idea what the answer 
is to any question and have to guess each answer. 

(a) Find the probability of answering the first question correctly. 

(b) Find the probability of answering the first two questions correctly. 

(c) Find the probability of answering all five questions correctly. 

(d) Find the probability of answering none of the questions correctly. 

(e) Find the probability of answering at least one of the questions correctly. 


. Bookbinding Defects A printing company’s bookbinding machine has a 


probability of 0.005 of producing a defective book. This machine is used to 
bind three books. 

(a) Find the probability that none of the books are defective. 

(b) Find the probability that at least one of the books is defective. 

(c) Find the probability that all of the books are defective. 


. Warehouses A distribution center receives shipments of a product from 


three different factories in the following quantities: 50, 35, and 25. Three 
times a product is selected at random, each time without replacement. Find 
the probability that (a) all three products came from the third factory and 
(b) none of the three products came from the third factory. 


. Birthdays Three people are selected at random. Find the probability that 


(a) all three share the same birthday and (b) none of the three share the 
same birthday. Assume 365 days in a year. 


M@ EXTENDING CONCEPTS 


According to Bayes’ Theorem, the probability of event A, given that event B has 
occurred, is 


P(A|B) = 


P(A)+ P(B|A) 
P(A) + P(B|A) + P(A')+ P(B|A’)) 


In Exercises 35-38, use Bayes’ Theorem to find P(A|B). 


35. 


P(A) = 3, P(A’) =4, P(B|A) =3, and P(B|A’) = 3 
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39. 


40. 
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P(A) = 3, P(A’) = 3, P(BIA) = 3, and P(BIA’) =} 

P(A) = 0.25, P(A’) = 0.75, P(B|A) = 0.3, and P(B|A’) = 0.5 

P(A) = 0.62, P(A’) = 0.38, P(B|A) = 0.41, and P(B|A’) = 0.17 
Reliability of Testing A certain virus infects one in every 200 people. A test 
used to detect the virus in a person is positive 80% of the time if the person 
has the virus and 5% of the time if the person does not have the virus. (This 


5% result is called a false positive.) Let A be the event “the person is 
infected” and B be the event “the person tests positive.” 


(a) Using Bayes’ Theorem, if a person tests positive, determine the 
probability that the person is infected. 

(b) Using Bayes’ Theorem, if a person tests negative, determine the 
probability that the person is not infected. 


Birthday Problem You are in a class that has 24 students. You want to find 
the probability that at least two of the students share the same birthday. 


(a) First, find the probability that each student has a different birthday. 
24 factors 
a 
_ 365 364 363 362 343 342 
365 365 365 365 365 365 


(b) The probability that at least two students have the same birthday is the 
complement of the probability in part (a). What is this probability? 


P(different birthdays) 


(c) We used a technology tool to generate 24 random numbers between 
1 and 365. Each number represents a birthday. Did we get at least two 
people with the same birthday? 


228 348 181 317 81 183 

52 346 177 118 315 273 
252 168 281 266 285 13 
118 360 8 193 57 107 


(d) Use a technology tool to simulate the “Birthday Problem.” Repeat the 
simulation 10 times. How many times did you get at least two people 
with the same birthday? 


The Multiplication Rule and Conditional Probability By rewriting 
the formula for the Multiplication Rule, you can write a formula for finding 
conditional probabilities. The conditional probability of event B occurring, given 
that event A has occurred, is 


P(B|A) = 


P(A and B) 
P(A) 


In Exercises 41 and 42, use the following information. 


e The probability that an airplane flight departs on time is 0.89. 


e The probability that a flight arrives on time is 0.87. 


e The probability that a flight departs and arrives on time is 0.83. 


41. 
42. 


Find the probability that a flight departed on time given that it arrives on time. 


Find the probability that a flight arrives on time given that it departed on time. 
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The Addition Rule 


WHAT YOU SHOULD LEARN 


> How to determine if two 
events are mutually exclusive 


>» How to use the Addition Rule 
to find the probability of 
two events 


STUDY TIP 


In probability and statistics, the 
word or is usually used as an 
“inclusive or” rather than an 
“exclusive or.” For instance, 
there are three ways for 
“event A or B” to occur. 


(1) A occurs and B does 
not occur. 


(2) B occurs and A 
does not occur. 


(3) A and B both occur. 


Mutually Exclusive Events » The Addition Rule » A Summary of Probability 


>» MUTUALLY EXCLUSIVE EVENTS 


In Section 3.2, you learned how to find the probability of two events, A and B, 
occurring in sequence. Such probabilities are denoted by P(A and B). In this 
section, you will learn how to find the probability that at least one of two events 
will occur. Probabilities such as these are denoted by P(A or B) and depend on 
whether the events are mutually exclusive. 


DEFINITION 


Two events A and B are mutually exclusive if A and B cannot occur at the 
same time. 


The Venn diagrams show the relationship between events that are mutually 
exclusive and events that are not mutually exclusive. 


A and B 


A and B are mutually exclusive. A and B are not mutually exclusive. 


EXAMPLE 1 


> Mutually Exclusive Events 
Decide if the events are mutually exclusive. Explain your reasoning. 


1. Event A: Roll a3 ona die. 
Event B: Rolla 4ona die. 


2. Event A: Randomly select a male student. 
Event B: Randomly select a nursing major. 


3. Event A: Randomly select a blood donor with type O blood. 
Event B: Randomly select a female blood donor. 


> Solution 


1. The first event has one outcome, a 3. The second event also has one 
outcome, a 4. These outcomes cannot occur at the same time, so the events 
are mutually exclusive. 


2. Because the student can be a male nursing major, the events are not 
mutually exclusive. 


3. Because the donor can be a female with type O blood, the events are not 
mutually exclusive. 
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STUDY TIP 


By subtracting P(A and B) 
you avoid double 
counting the probability 
of outcomes that occur 
in both A and B. 


Pp To explore this topic further, 


~ see Activity 3.3 on page 166. 


Deck of 52 Cards 


44 other cards 
Roll a Die 
4 
ILeS5 
than three 
2, 
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> Try It Yourself 1 
Decide if the events are mutually exclusive. Explain your reasoning. 
1. Event A: Randomly select a jack from a standard deck of cards. 
Event B: Randomly select a face card from a standard deck of cards. 
2. Event A: Randomly select a 20-year-old student. 
Event B: Randomly select a student with blue eyes. 
3. Event A: Randomly select a vehicle that is a Ford. 
Event B: Randomly select a vehicle that is a Toyota. 
a. Decide if one of the following statements is true. 


e Events A and B cannot occur at the same time. 
e Events A and B have no outcomes in common. 
e P(Aand B) =0 


b. Make a conclusion. Answer: Page A35 


> THE ADDITION RULE 


THE ADDITION RULE FOR THE PROBABILITY OF A OR B 


The probability that events A or B will occur, P(A or B), is given by 
P(A or B) = P(A) + P(B) — P(Aand B). 


If events A and B are mutually exclusive, then the rule can be simplified to 
P(A or B) = P(A) + P(B). This simplified rule can be extended to any 
number of mutually exclusive events. 


In words, to find the probability that one event or the other will occur, add the 
individual probabilities of each event and subtract the probability that they both occur. 


EXAMPLE 2 


» Using the Addition Rule to Find Probabilities 


1. You select a card from a standard deck. Find the probability that the card is 
a 4 or an ace. 


2. You roll a die. Find the probability of rolling a number less than 3 or rolling 
an odd number. 
> Solution 


1. If the card is a 4, it cannot be an ace. So, the events are mutually exclusive, 
as shown in the Venn diagram. The probability of selecting a 4 or an ace is 
4 4 8 2, 


P(4 or ace) = P(4) + P(ace) = 7) + 350 


= 0.154. 


2. The events are not mutually exclusive because 1 is an outcome of both 
events, as shown in the Venn diagram. So, the probability of rolling a 
number less than 3 or an odd number is 


P(less than 3 or odd) = P(less than 3) + P(odd) 
— P(less than 3 and odd) 


2,3 1.4 +2 
=A Rg Bo a 
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> Try It Yourself 2 


1. A die is rolled. Find the probability of rolling a 6 or an odd number. 
2. A card is selected from a standard deck. Find the probability that the card 
is a face card or a heart. 


. Decide whether the events are mutually exclusive. 
. Find P(A), P(B), and, if necessary, P(A and B). 
c. Use the Addition Rule to find the probability. Answer: Page A35 


EXAMPLE 3 


> Finding Probabilities of Mutually Exclusive Events 


The frequency distribution shows volumes of sales (in dollars) and the number 
of months in which a sales representative reached each sales level during the 
past three years. If this sales pattern continues, what is the probability that 
the sales representative will sell between $75,000 and $124,999 next month? 


0-24,999 
25,000-—49,999 
50,000-74,999 
75,000-—99,999 

100,000—124,999 
125,000-149,999 
150,000-174,999 
175,000-199,999 


ae) 


PWN ON DWN W 


> Solution 
To solve this problem, define events A and B as follows. 


A = {monthly sales between $75,000 and $99,999} 
B = {monthly sales between $100,000 and $124,999} 


Because events A and B are mutually exclusive, the probability that the sales 
representative will sell between $75,000 and $124,999 next month is 


P(A or B) = P(A) + P(B) 
7 9 
"ga. 36 


_ 16 
36 


4 
=— x 0.444, 
9 


> Try It Yourself 3 


Find the probability that the sales representative will sell between $0 and 
$49,999. 


a. Identify events A and B. 

b. Decide if the events are mutually exclusive. 

c. Find the probability of each event. 

d. Use the Addition Rule to find the probability. Answer: Page A35 
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In a survey conducted by Braun 
Research, coffee drinkers were 
asked how many cups of coffee 
they drink. (Source: Braun Research for 
International Delight Coffee House Inspirations) 


How Much Coffee 
Do You Drink? 
2) cups a 2 or more 
—~ cups a day 


1 cup a day 


If you selected a coffee drinker 
at random and asked how many 
cups of coffee he or she drinks, 
what is the probability that the 
coffee drinker would say he or 
she drinks 1 cup a week or 

2 cups a week? 
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EXAMPLE 4 


» Using the Addition Rule to Find Probabilities 


A blood bank catalogs the types of blood, including positive or negative 
Rh-factor, given by donors during the last five days. The number of donors 
who gave each blood type is shown in the table. A donor is selected at random. 


1. Find the probability that the donor has type O or type A blood. 
2. Find the probability that the donor has type B blood or is Rh-negative. 


> Solution 
1. Because a donor cannot have type O blood and type A blood, these events 
are mutually exclusive. So, using the Addition Rule, the probability that a 
randomly chosen donor has type O or type A blood is 
P(type O or type A) = P(type O) + P(type A) 


_ 184, 164 
409 409 


a2 
409 


= 0.851. 
2. Because a donor can have type B blood and be Rh-negative, these events 


are not mutually exclusive. So, using the Addition Rule, the probability that 
a randomly chosen donor has type B blood or is Rh-negative is 


P(type B or Rh-neg) = P(type B) + P(Rh-neg) — P(type B and Rh-neg) 


_ 45, 65 8 
409 " 409 409 
=e 
409 
~ 0.249. 


> Try It Yourself 4 


1. Find the probability that the donor has type B or type AB blood. 
2. Find the probability that the donor has type O blood or is Rh-positive. 


. Identify events A and B. 

. Decide if the events are mutually exclusive. 

. Find the probability of each event. 

. Use the Addition Rule to find the probability. Answer: Page A35 


cn 
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Classical 
Probability 


Empirical 
Probability 


Range of 
Probabilities Rule 


Complementary 
Events 


Multiplication 
Rule 


Addition Rule 
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PROBABILITY 


» A SUMMARY OF PROBABILITY 


The number of outcomes in the sample 


space is known and each outcome is P(E) 


equally likely to occur. 


The frequency of outcomes in the sample 


Number of outcomes in event £ 


Number of outcomes in sample space 


Frequency ofeventE f 


space is estimated from experimentation. Total frequency 7 
The probability of an event is betweenO0 0s P(E) =1 

and 1, inclusive. 

The complement of event EF is the set of P(E’) =1—- P(E) 


all outcomes in a sample space that are 
not included in E, denoted by E’. 


The Multiplication Rule is used to find 
the probability of two events occurring 
in a sequence. 


The Addition Rule is used to find the 
probability of at least one of two events 
occurring. 


EXAMPLE 5 


P(A and B) = P(A): P(B|A) 
P(A and B) = P(A): P(B) 


Independent events 


P(A) + P(B) — P(Aand B) 
P(Aor B) = P(A) + P(B) Mutually exclusive 
events 


>» Combining Rules to Find Probabilities 


Use the graph at the right to find 
the probability that a randomly 
selected draft pick is not a running 
back or a wide receiver. 


> Solution 
Define events A and B. 


A: Draft pick is a running back. 
B: Draft pick is a wide receiver. 


These events are mutually exclusive, 
so the probability that the draft 
pick is a running back or wide 
receiver is 


NFL Rookies 


A breakdown by position of the 256 players picked in the 
| 2009 NFL draft: 


(Source: National Football League) 


P(Aor B) = P(A) + P(B) = 4+ #%=% =$ 0.219. 


By taking the complement of P(Aor B), you can determine that the 
probability of randomly selecting a draft pick who is not a running back or 


wide receiver is 


1- P(AorB)=1-4% 


> Try It Yourself 5 


25 
= 3 ~ 0.781. 


Find the probability that a randomly selected draft pick is not a linebacker or 


a quarterback. 


a. Find the probability that the draft pick is a linebacker or a quarterback. 


b. Find the complement of the event. 


Answer: Page A35 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. If two events are mutually exclusive, why is P(A and B) = 0? 
2. List examples of 


(a) two events that are mutually exclusive. 


(b) two events that are not mutually exclusive. 


True or False? Jn Exercises 3-6, determine whether the statement is true or 
false. If it is false, explain why. 


3. If two events are mutually exclusive, they have no outcomes in common. 
4. If two events are independent, then they are also mutually exclusive. 
5. The probability that event A or event B will occur is 
P(Aor B) = P(A) + P(B) — P(Aor B). 
6. If events A and B are mutually exclusive, then 
P(Aor B) = P(A) + P(B). 


Graphical Analysis Jn Exercises 7 and 8, decide if the events shown in the 
Venn diagram are mutually exclusive. Explain your reasoning. 


7. 


College students 


Students 
on 


Dean’s list Student- 


athletes 


Movies 


Movies 


that are Movies 


that are 
rated 
PG-13 


rated R 


Recognizing Mutually Exclusive Events Jn Exercises 9-12, decide if the 
events are mutually exclusive. Explain your reasoning. 


9. Event A: Randomly select a female public school teacher. 
Event B: Randomly select a public school teacher who is 25 years old. 


10. Event A: Randomly select a member of the U.S. Congress. 
Event B: Randomly select a male U.S. Senator. 


11. Event A: Randomly select a student with a birthday in April. 
Event B: Randomly select a student with a birthday in May. 


12. Event A: Randomly select a person between 18 and 24 years old. 
Event B: Randomly select a person who drives a convertible. 
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M@ USING AND INTERPRETING CONCEPTS 


13. 


Audit During a 52-week period, a company paid overtime wages for 
18 weeks and hired temporary help for 9 weeks. During 5 weeks, the 
company paid overtime and hired temporary help. 


(a) Are the events “selecting a week in which overtime wages were paid” 
and “selecting a week in which temporary help wages were paid” 
mutually exclusive? Explain. 


(b) If an auditor randomly examined the payroll records for only one week, 
what is the probability that the payroll for that week contained overtime 
wages or temporary help wages? 


. Conference A math conference has an attendance of 4950 people. Of these, 


2110 are college professors and 2575 are female. Of the college professors, 
960 are female. 


(a) Are the events “selecting a female” and “selecting a college professor” 
mutually exclusive? Explain. 


(b) The conference selects people at random to win prizes. Find the proba- 
bility that a selected person is a female or a college professor. 


. Carton Defects A company that makes cartons finds that the probability of 


producing a carton with a puncture is 0.05, the probability that a carton has 
a smashed corner is 0.08, and the probability that a carton has a puncture and 
has a smashed corner is 0.004. 


(a) Are the events “selecting a carton with a puncture” and “selecting a 
carton with a smashed corner” mutually exclusive? Explain. 


(b) Ifa quality inspector randomly selects a carton, find the probability that 
the carton has a puncture or has a smashed corner. 


. Can Defects A company that makes soda pop cans finds that the 


probability of producing a can without a puncture is 0.96, the probability that 
a can does not have a smashed edge is 0.93, and the probability that a can 
does not have a puncture and does not have a smashed edge is 0.893. 


(a) Are the events “selecting a can without a puncture” and “selecting a can 
without a smashed edge” mutually exclusive? Explain. 


(b) Ifa quality inspector randomly selects a can, find the probability that the 
can does not have a puncture or does not have a smashed edge. 


. Selecting a Card A card is selected at random from a standard deck. Find 


each probability. 
(a) Randomly selecting a club or a 3 
(b) Randomly selecting a red suit or a king 


(c) Randomly selecting a 9 or a face card 


. Rolling a Die You roll a die. Find each probability. 


(a) Rolling a5 or a number greater than 3 
(b) Rolling a number less than 4 or an even number 


(c) Rolling a 2 or an odd number 
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19. U.S. Age Distribution The estimated percent distribution of the U.S. 
population for 2020 is shown in the pie chart. Find each probability. (Source: 
U.S. Census Bureau) 
(a) Randomly selecting someone who is under 5 years old 
(b) Randomly selecting someone who is not 65 years or over 


(c) Randomly selecting someone who is between 20 and 34 years old 


US. Age Distribution Car Occupancy 


75 years or over 


N 6.7% 
65-74 \ lo 5-14 
years years 
5.2% \ 15-19 
years 
45-64 20-24 
years years 


25-34 years 


35-44 
years 
How Would You Grade 
the Quality of Public FIGURE FOR EXERCISE 19 FIGURE FOR EXERCISE 20 
Schools in the U.S.? 


20. Tacoma Narrows Bridge The percent distribution of the number of 


400 + occupants in vehicles crossing the Tacoma Narrows Bridge in Washington is 
2 350+ shown in the pie chart. Find each probability. (Source: Washington State 
z 300 ~- Department of Transportation) 
a 250-4 
E 200+ (a) Randomly selecting a car with two occupants 
1) 
FE i (b) Randomly selecting a car with two or more occupants 
espe: (c) Randomly selecting a car with between two and five occupants, inclusive 
~ 21. Education The number of responses to a survey are shown in the Pareto 
Response chart. The survey asked 1026 US. adults how they would grade the quality of 
FIGURE FOR EXERCISE 21 public schools in the United States. Each person gave one response. Find 


each probability. (Adapted from CBS News Poll) 


(a) Randomly selecting a person from the sample who did not give the 


ill Y hal Porti 
Will You Watch a Large Portion publicachoolsan A 


of the Winter Olympics? 
A (b) Randomly selecting a person from the sample who gave the public 
woe | schools a D or an F 
2 300+ 280 
z 250-4 22. Olympics The number of responses to a survey are shown in the Pareto 
eal 200 199 chart. The survey asked 1000 US. adults if they would watch a large portion 
5 150+ of the 2010 Winter Olympics. Each person gave one response. Find each 
& 100+ probability. (Adapted from Rasmussen Reports) 
507 10 (a) Randomly selecting a person from the sample who is not at all likely to 
E> b> 2 BB 7 7 watch a large portion of the Winter Olympics 
3 - 3 = = 25 3 (b) Randomly selecting a person from the sample who is not sure whether 
B ie they will watch a large portion of the Winter Olympics 
Response (c) Randomly selecting a person from the sample who is neither somewhat 
FIGURE FOR EXERCISE 22 likely nor very likely to watch a large portion of the Winter Olympics 
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23. 


24. 


25. 
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Nursing Majors The table shows the number of male and female students 
enrolled in nursing at the University of Oklahoma Health Sciences Center 
for a recent semester. A student is selected at random. Find the probability 
of each event. (Adapted from University of Oklahoma Health Sciences Center 
Office of Institutional Research) 


151 1104 1255 
1016 1693 2709 
1167 2797 3964 


(a) The student is male or a nursing major. 

(b) The student is female or not a nursing major. 

(c) The student is not female or is a nursing major. 

(d) Are the events “being male” and “being a nursing major” mutually 
exclusive? Explain. 


Left-Handed People Ina sample of 1000 people (525 men and 475 women), 
113 are left-handed (63 men and 50 women). The results of the sample are 
shown in the table. A person is selected at random from the sample. Find the 
probability of each event. 


(a) The person is left-handed or female. 

(b) The person is right-handed or male. 

(c) The person is not right-handed or is a male. 

(d) The person is right-handed and is a female. 

(e) Are the events “being right-handed” and “being female” mutually 
exclusive? Explain. 


Charity The table shows the results of a survey that asked 2850 people 
whether they were involved in any type of charity work. A person is selected 
at random from the sample. Find the probability of each event. 


221 456 795 1472 
207 430 741 1378 
Total 428 886 1536 2850 


(a) The person is frequently or occasionally involved in charity work. 
(b) The person is female or not involved in charity work at all. 

(c) The person is male or frequently involved in charity work. 

(d) The person is female or not frequently involved in charity work. 


(e) Are the events “being female” and “being frequently involved in charity 
work” mutually exclusive? Explain. 
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26. Eye Survey The table shows the results of a survey that asked 3203 people 
whether they wore contacts or glasses. A person is selected at random from 
the sample. Find the probability of each event. 


841 177 456 1538 
iss 427 368 681 1665 


‘Total 253 1268 545 | 1137 | 3203 


(a) The person wears only contacts or only glasses. 

(b) The person is male or wears both contacts and glasses. 

(c) The person is female or wears neither contacts nor glasses. 
(d) The person is male or does not wear glasses. 


(e) Are the events “wearing only contacts” and “wearing both contacts and 
glasses” mutually exclusive? Explain. 


M EXTENDING CONCEPTS 


27. Writing Is there a relationship between independence and mutual 
exclusivity? To decide, find examples of the following, if possible. 


(a) Describe two events that are dependent and mutually exclusive. 

(b) Describe two events that are independent and mutually exclusive. 

(c) Describe two events that are dependent and not mutually exclusive. 
(d) Describe two events that are independent and not mutually exclusive. 


Use your results to write a conclusion about the relationship between 
independence and mutual exclusivity. 


Addition Rule for Three Events The Addition Rule for the probability that 
event A or B or C will occur, P(A or B or C), is given by 


P(Aor BorC) = P(A) + P(B) + P(C) — P(Aand B) — P(A and C) 
— P(BandC) + P(Aand BandC). 


In the Venn diagram shown, P(A or B or C) is represented by the blue areas. 


i 
y 
Lo 


In Exercises 28 and 29, find P(A or B or C) for the given probabilities. 


28. P(A) = 0.40, P(B) = 0.10, P(C) = 0.50, 
Aand B) = 0.05, P(AandC) = 0.25, P(BandC) = 0.10, 
A and Band C) = 0.03 


) = 0.38, P(B) = 0.26, P(C) = 0.14, 
Aand B) = 0.12, P(AandC) = 0.03, P(B andC) = 0.09, 
P(A and Band C) = 0.01 


ba 
P( 
29. P(A 
P( 


30. Explain, in your own words, why in the Addition Rule for P(A or B or C), 


P(A and B and C)) is added at the end of the formula. 
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AQEEET 


ADBBET 


Simulating the Probability of Rolling a 3 or 4 


The simulating the probability of rolling a 3 or 4 applet allows you to investigate 
the probability of rolling a 3 or 4 on a fair die. The plot at the top left corner 
shows the probability associated with each outcome of a die roll. When ROLL is 
clicked, n simulations of the experiment of rolling a die are performed. The 
results of the simulations are shown in the frequency plot. If the animate option 
is checked, the display will show each outcome dropping into the frequency plot 
as the simulation runs. The individual outcomes are shown in the text field at the 
far right of the applet. The center plot shows in blue the cumulative proportion 
of times that an event of rolling a 3 or 4 occurs. The green line in the plot reflects 
the true probability of rolling a 3 or 4. As the experiment is conducted over and 
over, the cumulative proportion should converge to the true value. 


Probability 


01S + 
0.1 
0.05 
0 


Py 
12 3 4 5 6 


Rolls: 


= 


Frequency 


0.3333 


Rolls 


m= Explore 


Step 1 Specify a value for n. 

Step 2. Click ROLL four times. 
Step 3. Click RESET. 

Step 4 Specify another value for n. 
Step 5 Click ROLL. 


m= Draw Conclusions 
1. What is the theoretical probability of rolling a 3 or 4? 


2. Run the simulation using each value of n one time. Clear the results after each 
trial. Compare the cumulative proportion of rolling a 3 or 4 for each trial with 
the theoretical probability of rolling a 3 or 4. 


3. Suppose you want to modify the applet so you can find the probability of 
rolling a number less than 4. Describe the placement of the green line. 


United States Congress 


Congress is made up of the House of Representatives and the Senate. Members of the House of 
Representatives serve two-year terms and represent a district in a state. The number of representatives 
each state has is determined by population. States with larger populations have more representatives 
than states with smaller populations. The total number of representatives is set by law at 
435 members. Members of the Senate serve six-year terms and represent a state. Each state has 
2 senators, for a total of 100. The tables show the makeup of the 111th Congress by gender and 
political party. There are two vacant seats in the House of Representatives. 


House of Representatives 
Political Party 


Republican Democrat Independent Total 


Male 161 «196 0 357 
Gender Female 17 59 0 76 
Total 178 255 0 433 
Senate 
Political Party 


Republican Democrat Independent Total 


Male Si 44 2 83 
Gender Female 4 13 0 17 
Total 41 57 2 100 
M@ EXERCISES 
1. Find the probability that a randomly selected 4. A senator is selected at random. Find the 


representative is female. Find the probability 
that a randomly selected senator is female. 


2. Compare the probabilities from Exercise 1. 


3. A representative is selected at random. Find 
the probability of each event. 
(a) The representative is male. 
(b) The representative is a Republican. 


(c) The representative is male given that the 
representative is a Republican. 

(d) The representative is female and a 
Democrat. 

(e) Are the events “being female” and “being 
a Democrat” independent or dependent 
events? Explain. 


probability of each event. 


(a) The senator is male. 

(b) The senator is not a Democrat. 

(c) The senator is female or a Republican. 
(d) The senator is male or a Democrat. 


(e) Are the events “being female” and “being 
an Independent” mutually exclusive? 
Explain. 


. Using the same row and column headings as 


the tables above, create a combined table for 
Congress. 


. A member of Congress is selected at random. 


Use the table from Exercise 5 to find the 
probability of each event. 

(a) The member is Independent. 

(b) The member is female and a Republican. 


(c) The member is male or a Democrat. 


CHAPTER 3 PROBABILITY 


168 


WHAT YOU SHOULD LEARN 


> How to find the number of 
ways a group of objects can 
be arranged in order 


> How to find the number of 
ways to choose several objects 
from a group without regard 
to order 


» How to use counting principles 


‘ 


Sudoku Number Puzzle 


to find probabilities 


STUDY TIP 


Notice at the right 
that as n increases, n! 
becomes very large. 
Take some time now to 
learn how to use the 
factorial key on your 
calculator. 
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Additional Topics in Probability and Counting 


Permutations >» Combinations >» Applications of Counting Principles 


>» PERMUTATIONS 


In Section 3.1, you learned that the Fundamental Counting Principle is used to 
find the number of ways two or more events can occur in sequence. In this 
section, you will study several other techniques for counting the number of ways 
an event can occur. An important application of the Fundamental Counting 
Principle is determining the number of ways that n objects can be arranged in 
order or in a permutation. 


DEFINITION 


A permutation is an ordered arrangement of objects. The number of different 
permutations of n distinct objects is n!. 


The expression n! is read as n factorial and is defined as follows. 
nl = n-(n— 1)+(n — 2)+(n — 3)---3+201 
As a special case, 0! = 1. Here are several other values of n!. 


1! = 1,2! = 2-1 = 2,3! = 3-2-1 =6, 4! = 4°3-2-1 = 24 


EXAMPLE 1 


> Finding the Number of Permutations of n Objects 

The objective of a 9 X 9 Sudoku number puzzle is to fill the grid so that each 
row, each column, and each 3 X 3 grid contain the digits 1 to 9. How many 
different ways can the first row of a blank 9 X 9 Sudoku grid be filled? 


> Solution 
The number of permutations is 9! = 9-8-7:6:5-4-3+2-1 = 362,880. So, 
there are 362,880 different ways the first row can be filled. 


> Try It Yourself 1 

The women’s hockey teams for the 2010 Olympics are Canada, Sweden, 
Switzerland, Slovakia, United States, Finland, Russia, and China. How many 
different final standings are possible? 


a. Determine the total number of women’s hockey teams n that are in the 
2010 Olympics. 


b. Evaluate n!. Answer: Page A35 


Suppose you want to choose some of the objects in a group and put them in 
order. Such an ordering is called a permutation of n objects taken r at a time. 


PERMUTATIONS OF n OBJECTS TAKEN r AT A TIME 
The number of permutations of n distinct objects taken r at a time is 


n\ 
4Pe= P where r Sn. 


(a = 
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STUDY TIP 


Detailed instructions for using 
MINITAB, Excel, and the 

TI-83/84 Plus are shown in the 
Technology Guide that accompanies 
this text. For instance, here are 
instructions for finding the number 
of permutations of n objects taken 
rata time on a TI-83/84 Plus. 


Enter the total number 
of objects n. 

MATH 
INSIGHT 
Notice that the Fundamental 
Counting Principle can be used 
in Example 3 to obtain the same 
result. There are 43 choices 
for first place, 42 choices 
for second place, and 


41 choices for third 
place. So, there are 


43-42-41 = 74,046 


ways the cars can finish 
first, second, and third. 


Choose the PRB menu. 
2: nPr 


Enter the number of 
objects r taken. 


TEIN IMIR 


j 


STUDY TIP 


The letters AAAABBC 

can be rearranged in 7! 
orders, but many of these 
are not distinguishable. 
The number of 
distinguishable orders is 


7! 76:5 
Al-2!-1! 2 
= 105. 


t 


EXAMPLE 2 


> Finding ,,P, 

Find the number of ways of forming four-digit codes in which no digit is 
repeated. 

> Solution 


To form a four-digit code with no repeating digits, you need to select 4 digits 
from a group of 10,so nm = 10 andr = 4. 


10! 
nPy = 10P4 = (10 — 4)! 
10! 10°9+8+7-6+5+4+3+2+¥ 
~ 6! 6°5+4+3+2+¥ =n 


So, there are 5040 possible four-digit codes that do not have repeating digits. 


> Try It Yourself 2 
A psychologist shows a list of eight activities to her subject. How many ways 
can the subject pick a first, second, and third activity? 


a. Find the quotient of n! and (n — r)!. (List the factors and divide out.) 
b. Write the result as a sentence. Answer: Page A35 


EXAMPLE 3 


» Finding ,P, 
Forty-three race cars started the 2010 Daytona 500. How many ways can the 
cars finish first, second, and third? 


> Solution 


You need to select three race cars from a group of 43, so n = 43 and r = 3. 
Because the order is important, the number of ways the cars can finish first, 
second, and third is 


43! 43 
(43 — 3)! 40 


! 
nP. = 3P3 = 7 = 43-42-41 = 74,046. 


> Try It Yourself 3 


The board of directors of a company has 12 members. One member is the 
president, another is the vice president, another is the secretary, and another is 
the treasurer. How many ways can these positions be assigned? 


a. Identify the total number of objects n and the number of objects r being 
chosen in order. 
b. Evaluate ,,P.. Answer: Page A35 


You may want to order a group of n objects in which some of the objects 
are the same. For instance, consider a group of letters consisting of four As, 
two Bs, and one C. How many ways can you order such a group? Using the 
previous formula, you might conclude that there are 7P; = 7! possible orders. 
However, because some of the objects are the same, not all of these permutations 
are distinguishable. How many distinguishable permutations are possible? 
The answer can be found using the formula for the number of distinguishable 
permutations. 
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INSIGHT 


You can think of a 
combination of n objects 
chosen r at atime asa 
permutation of n 

objects in which the r 
selected objects are 
alike and the remaining 
n — r (not selected) 
objects are alike. 


PROBABILITY 
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DISTINGUISHABLE PERMUTATIONS 


The number of distinguishable permutations of n objects, where n, are of 
one type, 12 are of another type, and so on, is 


n! 


WANES Hy IP iy AP a AP °° AP Tp, = Tie 
ny! * Ny! +n3!--- ng!’ ! a : ‘3 


EXAMPLE 4 


» Finding the Number of Distinguishable Permutations 


A building contractor is planning to develop a subdivision. The subdivision is 
to consist of 6 one-story houses, 4 two-story houses, and 2 split-level houses. 
In how many distinguishable ways can the houses be arranged? 


> Solution 
There are to be 12 houses in the subdivision, 6 of which are of one type 
(one-story), 4 of another type (two-story), and 2 of a third type (split-level). 
So, there are 
12! _ 12-11-10-9-8-7-6! 
6!- 41-2! 6! -4!-2! 
= 13,860 distinguishable ways. 


Interpretation ‘There are 13,860 distinguishable ways to arrange the houses 
in the subdivision. 


> Try It Yourself 4 


The contractor wants to plant six oak trees, nine maple trees, and five poplar 
trees along the subdivision street. The trees are to be spaced evenly. In how 
many distinguishable ways can they be planted? 


a. Identify the total number of objects n and the number of each type of 
object in the groups 11, n>, and n3. 


n! 
b. Evaluate . Answer: Page A36 


tlt My! ny! 


>» COMBINATIONS 


You want to buy three DVDs from a selection of five DVDs labeled A, B, C, D, 
and E. There are 10 ways to make your selections. 


ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE 


In each selection, order does not matter (ABC is the same set as BAC). The 
number of ways to choose r objects from n objects without regard to order is 
called the number of combinations of n objects taken r at a time. 


COMBINATIONS OF n OBJECTS TAKEN r AT A TIME 


A combination is a selection of r objects from a group of n objects without 
regard to order and is denoted by ,C,. The number of combinations of r 
objects selected from a group of 1 objects is 


n! 


oon ts 
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EXAMPLE 5 


> Finding the Number of Combinations 


ey TIP : ae A state’s department of transportation plans to develop a new section of 
Here are instructions for finding interstate highway and receives 16 bids for the project. The state plans to hire 
cide Woe Ceinlsiieulielns e. four of the bidding companies. How many different combinations of four 


n objects taken r at a time on ? Sack «nc 
a TI-83/84 Plus. companies can be selected from the 16 bidding companies’ 


Enter the total number of > Solution 
objects n. The state is selecting four companies from a group of 16,so n = 16 andr = 4. 
MATH Because order is not important, there are 
Choose the PRB menu. Cs] 16! 
n\r 16.4 (16 _ 4)!4! 
Bane — 
Enter the number of _ _16! 
objects r taken. 12!4! 
ENTER 16-15-14-13-12! 
12!-4! 


= 1820 different combinations. 


Interpretation There are 1820 different combinations of four companies that 
can be selected from the 16 bidding companies. 


> Try It Yourself 5 


The manager of an accounting department wants to form a three-person 
advisory committee from the 20 employees in the department. In how many 
ways can the manager form this committee? 


a. Identify the number of objects in the group and the number of objects r 
to be selected. 

b. Evaluate ,C,. 

c. Write the result as a sentence. Answer: Page A36 


The table summarizes the counting principles. 


Fundamental If one event can occur in m ways and a mn 
Counting second event can occur in 1 ways, the 
Principle number of ways the two events can occur 


in sequence is m:n. 


Permutations The number of different ordered n! 
arrangements of n distinct objects 


The number of permutations of n n! 
distinct objects taken r at a time, 
wherer =n 


The number of distinguishable permuta- n! 
tions of n objects where n, are of one ny!+no!--- ny! 
type, nz are of another type, and so on 


Combinations The number of combinations of r n! 
objects selected from a group of n 
objects without regard to order 
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>» APPLICATIONS OF COUNTING PRINCIPLES 


The largest lottery jackpot ever, EXAMPLE 6 
$390 million, was won in the 


Mega Millions lottery. When the » Finding Probabilities 
Mega Millions jackpot was won, 


: A student advisory board consists of 17 members. Three members serve as the 
five numbers were chosen from 


1 to°S6 and one number the board’s chair, secretary, and webmaster. Each member is equally likely to serve 
Mega Ball, was chosen ain in any of the positions. What is the probability of selecting at random the three 
1 to 46. The winning numbers members who currently hold the three positions? 


are shown below. 


> Solution There is one favorable outcome and there are 


17! 17! 17-16-15-14! 


P 
3" (47-3)! 14! 14! 


= 17-16-15 = 4080 


ways the three positions can be filled. So, the probability of correctly selecting 
the three members who hold each position is 


1 
P(selecting the three members) = ———~ ~ 0.0002. 


4080 
Ball > Try It Yourself 6 
If you buy one ticket, what is A student advisory board consists of 20 members. Two members serve as the 
the probability that you will board’s chair and secretary. Each member is equally likely to serve in either of 
win the Mega Millions lottery? the positions. What is the probability of selecting at random the two members 


who currently hold the two positions? 


a. Find the number of ways the two positions can be filled. 
b. Find the probability of correctly selecting the two members. 


Answer: Page A36 
EXAMPLE 7 


» Finding Probabilities 


You have 11 letters consisting of one M, four I’s, four S’s, and two P’s. If the 
letters are randomly arranged in order, what is the probability that the 
arrangement spells the word Mississippi? 


> Solution § There is one favorable outcome and there are 


11! 

Thedl-41-2! > 34,650 11 letters with 1, 4, 4, and 2 like letters 
distinguishable permutations of the given letters. So, the probability that the 
arrangement spells the word Mississippi is 


P(Mississippi) = = 0.00003. 


ame 
34,650 


> Try It Yourself 7 


You have 6 letters consisting of one L, two E’s, two T’s, and one R. If the 
letters are randomly arranged in order, what is the probability that the 
arrangement spells the word letter? 


a. Find the number of favorable outcomes and the number of distinguishable 
permutations. 
b. Find the probability that the arrangement spells the word letter. 
Answer: Page A36 
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EXAMPLE 8 


» Finding Probabilities 

Find the probability of picking five diamonds from a standard deck of playing 
cards. 

> Solution 


The possible number of ways of choosing 5 diamonds out of 13 is ,3Cs. The 
number of possible five-card hands is 55Cs. So, the probability of being dealt 
5 diamonds is 


BCs _ 1287 
Cs 2,598,960 


P(5 diamonds) = = 0.0005. 


> Try It Yourself 8 


Find the probability of being dealt five diamonds from a standard deck of 
playing cards that also includes two jokers. In this case, the joker is considered 
to be a wild card that can be used to represent any card in the deck. 


a. Find the number of ways of choosing 5 diamonds. 
b. Find the number of possible five-card hands. 
c. Find the probability of being dealt five diamonds. Answer: Page A36 


EXAMPLE 9 


» Finding Probabilities 


A food manufacturer is analyzing a sample of 400 corn kernels for the presence 
of a toxin. In this sample, three kernels have dangerously high levels of the 
toxin. If four kernels are randomly selected from the sample, what is the prob- 
ability that exactly one kernel contains a dangerously high level of the toxin? 


> Solution 


The possible number of ways of choosing one toxic kernel out of three toxic 
kernels is 3C,. The possible number of ways of choosing 3 nontoxic kernels 
from 397 nontoxic kernels is 397C3. So, using the Fundamental Counting 
Principle, the number of ways of choosing one toxic kernel and three nontoxic 
kernels is 


3C1 *397C3 = 3+ 10,349,790 
= 31,049,370. 


The number of possible ways of choosing 4 kernels from 400 kernels is 
4o0C4 = 1,050,739,900. So, the probability of selecting exactly 1 toxic kernel is 
3C 1° 307C3 31,049,370 


P(1 toxic kernel) = = = 0.030. 
( ) 400C 4 1,050,739,900 


> Try It Yourself 9 


A jury consists of five men and seven women. Three jury members are selected 
at random for an interview. Find the probability that all three are men. 


a. Find the product of the number of ways to choose three men from five and 
the number of ways to choose zero women from seven. 

b. Find the number of ways to choose 3 jury members from 12. 

c. Find the probability that all three are men. Answer: Page A36 
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EZ) EXERCISES 
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FOR EXTRA HELP; 


7 


2. 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 
i. 


When you calculate the number of permutations of n distinct objects taken 
r at a time, what are you counting? Give an example. 


When you calculate the number of combinations of r objects taken from a 
group of n objects, what are you counting? Give an example. 


True or False? Jn Exercises 3-6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


3. 
4. 
5. 


A combination is an ordered arrangement of objects. 
The number of different ordered arrangements of n distinct objects is n!. 


If you divide the number of permutations of 11 objects taken 3 at a time by 
3!, you will get the number of combinations of 11 objects taken 3 at a time. 


» 15 = C2 


In Exercises 7-14, perform the indicated calculation. 


Te 
9. 


11. 


13. 


Ps 8. 16 Po 
gC3 10. WP 4 
C4 

C 2 
2Cg Ce 
iP C 
6P2 4, 0&1 
P3 14C7 


In Exercises 15-18, decide if the situation involves permutations, combinations, or 
neither. Explain your reasoning. 


. The number of ways eight cars can line up in a row for a car wash 


. The number of ways a four-member committee can be chosen from 10 


people 


. The number of ways 2 captains can be chosen from 28 players on a lacrosse 


team 


. The number of four-letter passwords that can be created when no letter can 


be repeated 


USING AND INTERPRETING CONCEPTS 


. Video Games You have seven different video games. How many different 


ways can you arrange the games side by side on a shelf? 


. Skiing Eight people compete in a downhill ski race. Assuming that there 


are no ties, in how many different orders can the skiers finish? 


. Security Code In how many ways can the letters A, B, C, D, E, and F be 


arranged for a six-letter security code? 


. Starting Lineup The starting lineup for a softball team consists of 10 players. 


How many different batting orders are possible using the starting lineup? 
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23. 


Lottery Number Selection A lottery has 52 numbers. In how many different 
ways can 6 of the numbers be selected? (Assume that order of selection is 
not important.) 


. Assembly Process There are four processes involved in assembling a 


certain product. These processes can be performed in any order. 
Management wants to find which order is the least time-consuming. How 
many different orders will have to be tested? 


. Bracelets You are putting 4 spacers, 10 gold charms, and 8 silver charms on 


a bracelet. In how many distinguishable ways can the spacers and charms be 
put on the bracelet? 


. Experimental Group In order to conduct an experiment, 4 subjects are 


randomly selected from a group of 20 subjects. How many different groups 
of four subjects are possible? 


. Letters In how many distinguishable ways can the letters in the word 


statistics be written? 


. Jury Selection From a group of 40 people, a jury of 12 people is selected. In 


how many different ways can a jury of 12 people be selected? 


. Space Shuttle Menu Space shuttle astronauts each consume an average of 


3000 calories per day. One meal normally consists of a main dish, a vegetable 
dish, and two different desserts. The astronauts can choose from 10 main 
dishes, 8 vegetable dishes, and 13 desserts. How many different meals are 
possible? (Source: NASA) 


. Menu A restaurant offers a dinner special that has 12 choices for entrées, 


10 choices for side dishes, and 6 choices for dessert. For the special, you can 
choose one entrée, two side dishes, and one dessert. How many different 
meals are possible? 


. Water Samples An environmental agency is analyzing water samples from 


80 lakes for pollution. Five of the lakes have dangerously high levels of 
dioxin. If six lakes are randomly selected from the sample, how many ways 
could one polluted lake and five non-polluted lakes be chosen? Use a 
technology tool. 


. Soil Samples An environmental agency is analyzing soil samples from 


50 farms for lead contamination. Eight of the farms have dangerously high 
levels of lead. If 10 farms are randomly selected from the sample, how many 
ways could 2 contaminated farms and 8 noncontaminated farms be chosen? 
Use a technology tool. 


Word Jumble = In Exercises 33-38, do the following. 


(a) Find the number of distinguishable ways the letters can be arranged. 


(b) There is one arrangement that spells an important term used throughout 
the course. Find the term. 


(c) If the letters are randomly arranged in order, what is the probability that 
the arrangement spells the word from part (b)? Can this event be consid- 
ered unusual? Explain. 


33. palmes 34. nevte 
35. etre 36. rnctee 
37. unoppolati 38. sidtbitoiurn 
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Horse Race _ A horse race has 12 entries. Assuming that there are no ties, 
what is the probability that the three horses owned by one person finish first, 
second, and third? 


. Pizza Toppings A pizza shop offers nine toppings. No topping is used more 


than once. What is the probability that the toppings on a three-topping pizza 
are pepperoni, onions, and mushrooms? 


. Jukebox You look over the songs on a jukebox and determine that you like 


15 of the 56 songs. 
(a) What is the probability that you like the next three songs that are 
played? (Assume a song cannot be repeated.) 


(b) What is the probability that you do not like the next three songs that are 
played? (Assume a song cannot be repeated.) 


. Officers The offices of president, vice president, secretary, and treasurer 


for an environmental club will be filled from a pool of 14 candidates. Six of 
the candidates are members of the debate team. 


(a) What is the probability that all of the offices are filled by members of the 
debate team? 


(b) What is the probability that none of the offices are filled by members of 
the debate team? 


. Employee Selection Four sales representatives for a company are to be 


chosen to participate in a training program. The company has eight sales 
representatives, two in each of four regions. In how many ways can the four 
sales representatives be chosen if (a) there are no restrictions and (b) the 
selection must include a sales representative from each region? (c) What is 
the probability that the four sales representatives chosen to participate in the 
training program will be from only two of the four regions if they are chosen 
at random? 


. License Plates In a certain state, each automobile license plate number 


consists of two letters followed by a four-digit number. How many distinct 
license plate numbers can be formed if (a) there are no restrictions and 
(b) the letters O and I are not used? (c) What is the probability of selecting 
at random a license plate that ends in an even number? 


. Password A password consists of two letters followed by a five-digit 


number. How many passwords are possible if (a) there are no restrictions 
and (b) none of the letters or digits can be repeated? (c) What is the 
probability of guessing the password in one trial if there are no restrictions? 


. AreaCode An area code consists of three digits. How many area codes are 


possible if (a) there are no restrictions and (b) the first digit cannot be a 1 or 
a 0? (c) What is the probability of selecting an area code at random that ends 
in an odd number if the first digit cannot be a 1 or a 0? 


- Repairs In how many orders can three broken computers and two broken 


printers be repaired if (a) there are no restrictions, (b) the printers must be 
repaired first, and (c) the computers must be repaired first? (d) If the order 
of repairs has no restrictions and the order of repairs is done at random, what 
is the probability that a printer will be repaired first? 


. Defective Units A shipment of 10 microwave ovens contains two defective 


units. In how many ways can a restaurant buy three of these units and 
receive (a) no defective units, (b) one defective unit, and (c) at least two 
nondefective units? (d) What is the probability of the restaurant buying at 
least two nondefective units? 
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Rate Your Financial Shape 
Other 2% Excellent 7% 


Poor 24% 
Good 28% 


Fair 39% 
FIGURE FOR EXERCISES 49-52 
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Financial Shape Jn Exercises 49-52, use the pie chart, which shows how U.S. 
adults rate their financial shape. (Source: Pew Research Center) 


49. 


Suppose 4 people are chosen at random from a group of 1200. What is the 
probability that all four would rate their financial shape as excellent? (Make 
the assumption that the 1200 people are represented by the pie chart.) 


. Suppose 10 people are chosen at random from a group of 1200. What is the 


probability that all 10 would rate their financial shape as poor? (Make the 
assumption that the 1200 people are represented by the pie chart.) 


. Suppose 80 people are chosen at random from a group of 500. What is the 


probability that none of the 80 people would rate their financial shape 
as fair? (Make the assumption that the 500 people are represented by the 
pie chart.) 


. Suppose 55 people are chosen at random from a group of 500. What is the 


probability that none of the 55 people would rate their financial shape as 
good? (Make the assumption that the 500 people are represented by the 
pie chart.) 


. Probability In a state lottery, you must correctly select 5 numbers (in any 


order) out of 40 to win the top prize. 


(a) How many ways can 5 numbers be chosen from 40 numbers? 


(b) You purchase one lottery ticket. What is the probability that you will win 
the top prize? 


. Probability A company that has 200 employees chooses a committee of 


15 to represent employee retirement issues. When the committee is formed, 
none of the 56 minority employees are selected. 


(a) Use a technology tool to find the number of ways 15 employees can be 
chosen from 200. 


(b) Use a technology tool to find the number of ways 15 employees can be 
chosen from 144 nonminorities. 


c) If the committee is chosen randomly (without bias), what is the 
y 
probability that it contains no minorities? 


(d) Does your answer to part (c) indicate that the committee selection is 
biased? Explain your reasoning. 


. Cards You are dealt a hand of five cards from a standard deck of playing 


cards. Find the probability of being dealt a hand consisting of 

(a) four-of-a-kind. 

(b) a full house, which consists of three of one kind and two of another kind. 
(c) three-of-a-kind. (The other two cards are different from each other.) 


(d) two clubs and one of each of the other three suits. 


. Warehouse A warehouse employs 24 workers on first shift and 17 workers 


on second shift. Eight workers are chosen at random to be interviewed about 
the work environment. Find the probability of choosing 


(a) all first-shift workers. 
(b) all second-shift workers. 
(c) six first-shift workers. 


(d) four second-shift workers. 
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M@ EXTENDING CONCEPTS 


NBA Draft Lottery Jn Exercises 57-62, use the following information. The 
National Basketball Association (NBA) uses a lottery to determine which team 
gets the first pick in its annual draft. The teams eligible for the lottery are the 
14 non-playoff teams. Fourteen Ping-Pong balls numbered I through 14 are placed 
in a drum. Each of the 14 teams is assigned a certain number of possible 
four-number combinations that correspond to the numbers on the Ping-Pong balls, 
such as 3, 8, 10, and 12, as shown. Four balls are then drawn out to determine the 
first pick in the draft. The order in which the balls are drawn is not important. All 
of the four-number combinations are assigned to the 14 teams by computer except 
for one four-number combination. When this four-number combination is drawn, 
the balls are put back in the drum and another drawing takes place. For instance, 
if Team A has been assigned the four-number combination 3, 8, 10, 12 and the balls 
shown at the left are drawn, then Team A wins the first pick. 


After the first pick of the draft is determined, the process continues to choose the 
teams that will select second and third picks. A team may not win the lottery more 
than once. If the four-number combination belonging to a team that has already 
won is drawn, the balls are put back in the drum and another drawing takes place. 
The remaining order of the draft is determined by the number of losses of each team. 


57. In how many ways can 4 of the numbers 1 to 14 be selected if order is not 
important? How many sets of 4 numbers are assigned to the 14 teams? 


58. In how many ways can four of the numbers be selected if order is important? 


In the Pareto chart, the number of combinations assigned to each of the 14 teams 
is shown. The team with the most losses (the worst team) gets the most chances to 
win the lottery. So, the worst team receives the greatest frequency of four-number 
combinations, 250. The team with the best record of the 14 non-playoff teams has 
the fewest chances, with 5 four-number combinations. 


Frequency of Four-Number Combinations 
Assigned in the NBA Draft Lottery 


Is 
2n 
3r 
4 
5 
61 
7 
8 
9 


Frequency of combinations 
ae 
a el | 
‘(ii 
ae 
iz: 
ale 
is 
hfs 
ns 
4s 
1oth 4] = 
11th |] 0 
12th 4} 
13th f}o 
14th Jeu 
Y 


Ranking among 14 non-playoff teams, worst team first 


59. For each team, find the probability that the team will win the first pick. 
Which of these events would be considered unusual? Explain. 


60. What is the probability that the team with the worst record will win the 
second pick, given that the team with the best record, ranked 14th, wins the 
first pick? 


61. What is the probability that the team with the worst record will win the third 
pick, given that the team with the best record, ranked 14th, wins the first pick 
and the team ranked 2nd wins the second pick? 


62. What is the probability that neither the first- nor the second-worst team will 
get the first pick? 
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Uses 


Probability affects decisions when the weather is forecast, when marketing 
strategies are determined, when medications are selected, and even when 
players are selected for professional sports teams. Although intuition is often 
used for determining probabilities, you will be better able to assess the 
likelihood that an event will occur by applying the rules of classical probability 
and empirical probability. 

For instance, suppose you work for a real estate company and are asked 
to estimate the likelihood that a particular house will sell for a particular price 
within the next 90 days. You could use your intuition, but you could better 
assess the probability by looking at sales records for similar houses. 


Abuses 


One common abuse of probability is thinking that probabilities have “memories.” 
For instance, if a coin is tossed eight times, the probability that it will land heads 
up all eight times is only about 0.004. However, if the coin has already been tossed 
seven times and has landed heads up each time, the probability that it will land 
heads up on the eighth time is 0.5. Each toss is independent of all other tosses. 
The coin does not “remember” that it has already landed heads up seven times. 


Ethics 


A human resources director for a company with 100 employees wants to show 
that her company is an equal opportunity employer of women and minorities. 
There are 40 women employees and 20 minority employees in the company. 
Nine of the women employees are minorities. Despite this fact, the director 
reports that 60% of the company is either a woman or a minority. If one 
employee is selected at random, the probability that the employee is a woman 
is 0.4 and the probability that the employee is a minority is 0.2. This does not 
mean, however, that the probability that a randomly selected employee is a 
woman or a minority is 0.4 + 0.2 = 0.6, because nine employees belong to 
both groups. In this case, it would be ethically incorrect to omit this information 
from her report because these individuals would have been counted twice. 


Mi EXERCISES 


1. Assuming That Probability Has a “Memory” A “Daily Number” lottery 
has a three-digit number from 000 to 999. You buy one ticket each day. Your 
number is 389. 


a. What is the probability of winning next Tuesday and Wednesday? 
b. You won on Tuesday. What’s the probability of winning on Wednesday? 


ce. You didn’t win on Tuesday. What’s the probability of winning on 
Wednesday? 


2. Adding Probabilities Incorrectly A town has a population of 500 people. 
Suppose that the probability that a randomly chosen person owns a pickup 
truck is 0.25 and the probability that a randomly chosen person owns an 
SUV is 0.30. What can you say about the probability that a randomly 
chosen person owns a pickup or an SUV? Could this probability be 0.55? 
Could it be 0.60? Explain your reasoning. 
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©) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 3.1 
= How to identify the sample space of a probability experiment and how 1,2 1-4 


to identify simple events 


= How to use the Fundamental Counting Principle to find the number of 3,4 5,6 
ways two or more events can occur 


= How to distinguish among classical probability, empirical probability, 5-8 7-12 
and subjective probability 

= How to find the probability of the complement of an event and how 9-11 13-16 
to find other probabilities using the Fundamental Counting Principle 

Section 3.2 

= How to find conditional probabilities 1 17,18 

= How to distinguish between independent and dependent events 2 19-21 

= How to use the Multiplication Rule to find the probability of two events 3-5 22-24 


occurring in sequence 
P(A and B) = P(A): P(B|A) if events are dependent 


P(A and B) = P(A): P(B) if events are independent 
Section 3.3 
= How to determine if two events are mutually exclusive 1 25-27 
= How to use the Addition Rule to find the probability of two events 2-5 28-40 
P(A or B) = P(A) + P(B) — P(A and B) 
P(Aor B) = P(A) + P(B) if events are mutually exclusive 
Section 3.4 
= How to find the number of ways a group of objects can be arranged in 1-5 41-50 


order and the number of ways to choose several objects from a group 
without regard to order 


Pe a TT permutations of n objects taken r at a time 


distinguishable permutations 
ny! m Nn»! * n;! eee nx! 


n! a . : 
Cc, = ———— combinations of n objects taken r at a time 
n(n — rytr! : 


= How to use counting principles to find probabilities 6-9 51-55 
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ED REVIEW EXERCISES 


M@ SECTION 3.1 


In Exercises 1-4, identify the sample space of the probability experiment and 
determine the number of outcomes in the event. Draw a tree diagram if it is 
appropriate. 
1. Experiment: Tossing four coins 
Event: Getting three heads 


2. Experiment: Rolling 2 six-sided dice 
Event: Getting a sum of 4 or 5 


3. Experiment: Choosing a month of the year 
Event: Choosing a month that begins with the letter J 


4, Experiment: Guessing the gender(s) of the three children in a family 
Event: The family has two boys 


In Exercises 5 and 6, use the Fundamental Counting Principle. 


5. A student must choose from 7 classes to take at 8:00 A.M., 4 classes to take at 
9:00 A.M., and 3 classes to take at 10:00 A.M. How many ways can the student 
arrange the schedule? 


6. The state of Virginia’s license plates have three letters followed by four dig- 
its. Assuming that any letter or digit can be used, how many different license 
plates are possible? 


In Exercises 7-12, classify the statement as an example of classical probability, 
empirical probability, or subjective probability. Explain your reasoning. 


7. On the basis of prior counts, a quality control officer says there is a 0.05 
probability that a randomly chosen part is defective. 


8. The probability of randomly selecting five cards of the same suit from a 
standard deck is about 0.0005. 


9, The chance that Corporation A’s stock price will fall today is 75%. 
10. The probability that a person can roll his or her tongue is 70%. 
11. The probability of rolling 2 six-sided dice and getting a sum greater than 9 
1 
IS ¢- 
12. The chance that a randomly selected person in the United States is between 
15 and 29 years old is about 21%. (Source: U.S. Census Bureau) 


In Exercises 13 and 14, the table shows the approximate distribution of the sizes of 
firms for a recent year. Use the table to determine the probability of the event. 
(Adapted from U.S. Small Business Administration) 


0to4 5 to9 10to19 20to99 100 or more 


“Pereentoffirms 609% 176% 107% 90% 18% 


13. What is the probability that a randomly selected firm will have at least 
10 employees? 


14. What is the probability that a randomly selected firm will have fewer than 
20 employees? 


Presented by: https://jafrilibrary.org 


182 


CHAPTER 3 


PROBABILITY 


Presented by: https://jafrilibrary.org 


Telephone Numbers = The telephone numbers for a region of a state have an 
area code of 570. The next seven digits represent the local telephone numbers for 
that region. A local telephone number cannot begin with a 0 or 1. Your cousin lives 
within the given area code. 


15. What is the probability of randomly generating your cousin’s telephone 
number? 


16. What is the probability of not randomly generating your cousin’s telephone 
number? 


M@ SECTION 3.2 


For Exercises 17 and 18, the two statements below summarize the results of a study 
on the use of plus/minus grading at North Carolina State University. It shows the 
percents of graduate and undergraduate students who received grades with pluses 
and minuses (for example, C+, A-, etc.). (Source: North Carolina State University) 


e Of all students who received one or more plus grades, 92% were 
undergraduates and 8% were graduates. 
e Of all students who received one or more minus grades, 93% were 
undergraduates and 7% were graduates. 
17. Find the probability that a student is an undergraduate student, given that 
the student received a plus grade. 
18. Find the probability that a student is a graduate student, given that the 
student received a minus grade. 


In Exercises 19-21, decide whether the events are independent or dependent. 
Explain your reasoning. 


19. Tossing a coin four times, getting four heads, and tossing it a fifth time and 
getting a head 

20. Taking a driver’s education course and passing the driver’s license exam 

21. Getting high grades and being awarded an academic scholarship 


22. You are given that P(A) = 0.35 and P(B) = 0.25. Do you have enough 
information to find P(A and B)? Explain. 


In Exercises 23 and 24, find the probability of the sequence of events. 


23. You are shopping, and your roommate has asked you to pick up toothpaste 
and dental rinse. However, your roommate did not tell you which brands to 
get. The store has eight brands of toothpaste and five brands of dental rinse. 
What is the probability that you will purchase the correct brands of both 
products? Is this an unusual event? Explain. 


24. Your sock drawer has 18 folded pairs of socks, with 8 pairs of white, 6 pairs 
of black, and 4 pairs of blue. What is the probability, without looking in the 
drawer, that you will first select and remove a black pair, then select either a 
blue or a white pair? Is this an unusual event? Explain. 


M@ SECTION 3.3 


In Exercises 25-27, decide if the events are mutually exclusive. Explain your 
reasoning. 


25. Event A: Randomly select a red jelly bean from a jar. 
Event B: Randomly select a yellow jelly bean from the same jar. 


26. Event A: Randomly select a person who loves cats. 
Event B: Randomly select a person who owns a dog. 
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Students in 
Elementary Schools 


300-599 


Fewer 


Ose than 300 


1000 or more 
4.7% 


FIGURE FOR EXERCISES 35 AND 36 
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27. 


28. 


29. 


30. 


REVIEW EXERCISES 183 


Event A: Randomly select a U.S. adult registered to vote in Illinois. 
Event B: Randomly select a U.S. adult registered to vote in Florida. 


You are given that P(A) = 0.15 and P(B) = 0.40. Do you have enough 
information to find P(A or B)? Explain. 


A random sample of 250 working adults found that 37% access the Internet 
at work, 44% access the Internet at home, and 21% access the Internet at 
both work and home. What is the probability that a person in this sample 
selected at random accesses the Internet at home or at work? 


A sample of automobile dealerships found that 19% of automobiles sold are 
silver, 22% of automobiles sold are sport utility vehicles (SUVs), and 16% of 
automobiles sold are silver SUVs. What is the probability that a randomly 
chosen sold automobile from this sample is silver or an SUV? 


In Exercises 31-34, determine the probability. 


31. 


32. 


33. 


34. 


A card is randomly selected from a standard deck. Find the probability that 
the card is between 4 and 8, inclusive, or is a club. 


A card is randomly selected from a standard deck. Find the probability that 
the card is red or a queen. 


A 12-sided die, numbered 1 to 12, is rolled. Find the probability that the roll 
results in an odd number or a number less than 4. 


An 8-sided die, numbered 1 to 8, is rolled. Find the probability that the roll 
results in an even number or a number greater than 6. 


In Exercises 35 and 36, use the pie chart, which shows the percent distribution of 
the number of students in traditional U.S. elementary schools. (Source: U.S. National 
Center for Education Statistics) 


35. 
36. 


Find the probability of randomly selecting a school with 600 or more students. 


Find the probability of randomly selecting a school with between 300 and 999 
students, inclusive. 


In Exercises 37-40, use the Pareto chart, which shows the results of a survey in 
which 874 adults were asked which genre of movie they preferred. (Adapted from 
Rasmussen Reports) 


37. 


38. 


Which Genre of Movie Do You Prefer? 


Number responding 
—_ oa ie) Ne 
wn S wn S wn 
Oo Co Oo o Oo 
t t t t t 
ie) 
aS 
‘S) 
an 
= 
N 
Ne) 
a 
n 
NS 
n 
Nu 
wo 
a 
a 


| 


Comedy Drama Action Science Horror Not Some Musical 
fiction sure other 
genre 
Response 


Find the probability of randomly selecting an adult from the sample who 
prefers an action movie or a horror movie. 


Find the probability of randomly selecting an adult from the sample who 
prefers a drama or a musical. 
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39. 


40. 
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Find the probability of randomly selecting an adult from the sample who 
does not prefer a comedy. 


Find the probability of randomly selecting an adult from the sample who 
does not prefer a science fiction movie or an action movie. 


M@ SECTION 3.4 


In Exercises 41-44, perform the indicated calculation. 


41. 


45. 
46. 


uP 42. gPo 43. 1C4 44. 


Use a technology tool to find 59 Ps. 
Use a technology tool to find 3gC35. 


In Exercises 47-50, use combinations and permutations. 


47. 


48. 


49. 


50. 


Fifteen cyclists enter a race. In how many ways can they finish first, second, 
and third? 


Five players on a basketball team must each choose a player on the opposing 
team to defend. In how many ways can they choose their defensive assignments? 


A literary magazine editor must choose 4 short stories for this month’s issue 
from 17 submissions. In how many ways can the editor choose this month’s 
stories? 


An employer must hire 2 people from a list of 13 applicants. In how many 
ways can the employer choose to hire the 2 people? 


In Exercises 51-55, use counting principles to find the probability. Then tell 
whether the event can be considered unusual. 


51. 


52. 


53. 


54. 


55. 


A full house consists of a three of one kind and two of another kind. Find the 
probability of a full house consisting of three kings and two queens. 


A security code consists of three letters followed by one digit. The first letter 
cannot be an A, B, or C. What is the probability of guessing the security code 
in one trial? 


A batch of 200 calculators contains 3 defective units. What is the probability 
that a sample of three calculators will have 

(a) no defective calculators? 

(b) all defective calculators? 

(c) at least one defective calculator? 

(d) at least one nondefective calculator? 

A batch of 350 raffle tickets contains four winning tickets. You buy four 
tickets. What is the probability that you have 

(a) no winning tickets? 

(b) all of the winning tickets? 

(c) at least one winning ticket? 

(d) at least one nonwinning ticket? 

A corporation has six male senior executives and four female senior 


executives. Four senior executives are chosen at random to attend a 
technology seminar. What is the probability of choosing 


(a) four men? 

(b) four women? 

(c) two men and two women? 
(d) one man and three women? 
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DED cuapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. The table shows the number (in thousands) of earned degrees, by level and 
gender, conferred in the United States in a recent year. (Source: U.S. National 
Center for Education Statistics) 


Total 1193 | 1724 | 2917 


A person who earned a degree in the year is randomly selected. Find the 
probability of selecting someone who 

(a) earned a bachelor’s degree. 

(b) earned a bachelor’s degree given that the person is a female. 

(c) earned a bachelor’s degree given that the person is not a female. 

(d) earned an associate’s degree or a bachelor’s degree. 

(e) earned a doctorate given that the person is a male. 

(f) earned a master’s degree or is a female. 

(g) earned an associate’s degree and is a male. 

(h) is a female given that the person earned a bachelor’s degree. 


2. Which event(s) in Exercise 1 can be considered unusual? Explain your 
reasoning. 


3. Decide if the events are mutually exclusive. Then decide if the events are 
independent or dependent. Explain your reasoning. 


Event A: A golfer scoring the best round in a four-round tournament 
Event B: Losing the golf tournament 


4. A shipment of 250 netbooks contains 3 defective units. Determine how 
many ways a vending company can buy three of these units and receive 


(a) no defective units. 
(b) all defective units. 
(c) at least one good unit. 
5. In Exercise 4, find the probability of the vending company receiving 
(a) no defective units. 
(b) all defective units. 
(c) at least one good unit. 


6. The access code for a warehouse’s security system consists of six digits. The 
first digit cannot be 0 and the last digit must be even. How many different 
codes are available? 


7. From a pool of 30 candidates, the offices of president, vice president, secretary, 
and treasurer will be filled. In how many different ways can the offices 
be filled? 
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> a Real Statistics — Real Decisions 


You work for the company that runs the Powerball® lottery. Powerball , ; : “4 
is a lottery game in which five white balls are chosen from a drum (P © w (E &¢ } 
containing 59 balls and one red ball is chosen from a drum containing 
39 balls. To win the jackpot, a player must match all five white balls ueeeenaner om 
and the red ball. Other winners and their prizes are also shown in Reprinted with permission from the Multistate Lottery 
the table. 

Working in the public relations department, you handle many 


inquiries from the media and from lottery players. You receive the 


Powerball Winners and Prizes 


following e-mail. Approximate 
Match Prize probability 
You list the probability of matching only the red ball as 
1/62. I know from my statistics class that the probability of Swhite,lred Jackpot | 1/195,249,054 
winning is the ratio of the number of successful outcomes to the 5 white $200,000 1/5,138,133 
total number of outcomes. Could you please explain why the 4white,l red $10,000 1/723,145 
probability of matching only the red ball is 1/62? dee $100 1/19,030 
Your job is to answer this question, using the probability techniques you 3 white, 1 red $100 1/13,644 
have learned in this chapter to justify your answer. In answering the 3 \ nite $7 1/359 
ti ] ticket 1 hased. 
question, assume only one ticket is purchase jednard a8 $7 1/787 
1 white, 1 red $4 1/123 
1 red $3 1/62 


1. How Would You Do It? 


(a) How would you investigate the question about the probability of 
matching only the red ball? 


(Source: Multi-State Lottery Association) 


Where Is Powerball Played? 

(b) What statistical methods taught in this chapter would you use? Powerball is played in 42 states, 

Washington, D.C., and the 
US. Virgin Islands 


2. Answering the Question 


Write an explanation that answers the question about the probability 
of matching only the red ball. Include in your explanation any 
probability formulas that justify your explanation. 


3. Another Question 


You receive another question asking how the overall probability of 
winning a prize in the Powerball lottery is determined. The overall 
probability of winning a prize in the Powerball lottery is 1/35. Write an 
explanation that answers the question and include any probability a 
formulas that justify your explanation. U.S Virgin Islands 


(Source: Multi-State Lottery Association) 
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TECHNOLOGY 


MINITAB 


TECHNOLOGY 


SIMULATION: COMPOSING MOZART VARIATIONS WITH DICE 


Wolfgang Mozart (1756-1791) composed a wide 
variety of musical pieces. In his Musical Dice Game, 
he wrote a Wiener minuet with an almost endless 
number of variations. Each minuet has 16 bars. In 
the eighth and sixteenth bars, the player has a 
choice of two musical phrases. In each of the other 
14 bars, the player has a choice of 11 phrases. 


To create a minuet, Mozart suggested that the 
player toss 2 six-sided dice 16 times. For the eighth 
and sixteenth bars, choose Option 1 if the dice 
total is odd and Option 2 if it is even. For each of 
the other 14 bars, subtract 1 from the dice total. 
The following minuet is the result of the following 
sequence of numbers. 


>» Ff it © 4! 


1 D 3 4 
= 2 2 re TY 
WY Ne ore i kek 2rd 

S/11 TAA 1/11 6/11 
3 6 7 & 
a ee ee 
4/11 10/11 S/11 a 
9 10, A Il HZ = 
: ze f = =S5aS ————| 
ofl 6/11 2/11 4/11 
13 2 14 Hs) 16 
= = SS SS 
6/11 8/11 8/11 2/2 


M@ EXERCISES 


1. How many phrases did Mozart write to create 
the Musical Dice Game minuet? Explain. 


2. How many possible variations are there in 
Mozart’s Musical Dice Game minuet? Explain. 


3. Use technology to randomly select a number 
from 1 to 11. 


(a) What is the theoretical probability of each 
number from 1 to 11 occurring? 


(b) Use this procedure to select 100 integers 
from 1 to 11. Tally your results and com- 
pare them with the probabilities in part 
(a). 

4. What is the probability of randomly selecting 
option 6, 7, or 8 for the first bar? For all 14 bars? 
Find each probability using (a) theoretical 
probability and (b) the results of Exercise 3(b). 


5. Use technology to randomly select two 
numbers from 1, 2,3, 4,5, and 6. Find the sum 
and subtract 1 to obtain a total. 


(a) What is the theoretical probability of each 
total from 1 to 11? 

(b) Use this procedure to select 100 totals 
from 1 to 11. Tally your results and com- 
pare them with the probabilities in part 
(a). 

6. Repeat Exercise 4 using the results of Exercise 

5(b). 


187 


TI-83/84 PLUS 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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DISCRETE 
PROBABILITY 
DISTRIBUTIONS 


———— 


4.1 Probability 
Distributions 


4.2 Binomial Distributions 


@ ACTIVITY 
@ CASE STUDY 

4.3 More Discrete 
Probability 
Distributions 
@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


m@ TECHNOLOGY 


The National Climatic Data Center (NCDC) is the 
world’s largest active archive of weather data. 
NCDC archives weather data from the Coast 
Guard, Federal Aviation Administration, Military 
Services, the National Weather Service, and 
voluntary observers. 
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«€ WHERE YOU'VE BEEN 


In Chapters 1 through 3, you learned how to 
collect and describe data and how to find the 
probability of an event. These skills are used in 
many different types of careers. For instance, 
data about climatic conditions are used to 
analyze and forecast the weather throughout the 
world. On a typical day, aircraft, National 
Weather Service cooperative observers, radar, 
remote sensing systems, satellites, ships, weather 
balloons, wind profilers, and a variety of other 


WHERE YOU’RE GOING p> 


data-collection devices work together to provide 
meteorologists with data that are used to 
forecast the weather. Even with this much data, 
meteorologists cannot forecast the weather with 
certainty. Instead, they assign probabilities to 
certain weather conditions. For instance, a 
meteorologist might determine that there is a 40% 
chance of rain (based on the relative frequency of 
rain under similar weather conditions). 


In Chapter 4, you will learn how to Day1 Day2 Day3 Probability | Days of Rain 
seals and ee probability distribu- be - PG te) = 0.216 i 
tions. Knowing the shape, center, and —t 0.4 

vegaivility of a queteiilhiy inion i. FS lh lal eal ams : 

will enable you to make decisions Ls ee = 01 : 

in inferential statistics. You are a _| Ag: P(e, 6, 6) = 0.096 2 
meteorologist working on a three-day oe 7 P(6, 3%, H) = 0.144 1 
forecast. Assuming that having rain on 0.4 6 P(b,%, 6) = 0.096 2 

one day is independent of having rain arse 04 “ P(6, 6,38) = 0.096 2 

on another day, you have determined 6 ve P(6,6, 6) = 0.064 3 


that there is a 40% probability of rain 


(and a 60% probability of no rain) on each of the three days. What is the probability that it will rain on 
0,1,2, or 3 of the days? To answer this, you can create a probability distribution for the possible outcomes. 


Using the Addition Rule with the probabilities in the tree diagram, you can determine the probabilities 
of having rain on various numbers of days. You can then use this information to graph a probability 


distribution. 


0 1 0.216 
i 3 0.432 
2 iS) 0.288 
3 1 0.064 


Number of Days of Rain 
P(x) 
A 


0.40 + 
0.35 5 


0.20 5 
OS, 
0.10 = 
0.05 5 


Probability 
°o 
iv 
nn 
t 


0 1 2 3 
Days of rain 
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190 CHAPTER 4 


» 


vv 


How to distinguish between 
discrete random variables and 
continuous random variables 


How to construct a discrete 
probability distribution and its 
graph 


How to determine if a 
distribution is a probability 
distribution 


How to find the mean, 
variance, and standard 
deviation of a discrete 
probability distribution 


How to find the expected 
value of a discrete probability 
distribution 
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DISCRETE PROBABILITY DISTRIBUTIONS 


Probability Distributions 


WHAT YOU SHOULD LEARN 


Random Variables >» Discrete Probability Distributions » Mean, Variance, 
and Standard Deviation > Expected Value 


> RANDOM VARIABLES 


The outcome of a probability experiment is often a count or a measure. When this 
occurs, the outcome is called a random variable. 


DEFINITION 


A random variable x represents a numerical value associated with each 
outcome of a probability experiment. 


The word random indicates that x is determined by chance. There are two 
types of random variables: discrete and continuous. 


DEFINITION 


A random variable is discrete if it has a finite or countable number of possible 
outcomes that can be listed. 


A random variable is continuous if it has an uncountable number of possible 
outcomes, represented by an interval on the number line. 


You conduct a study of the number of calls a telemarketer makes in one day. 
The possible values of the random variable x are 0, 1, 2,3, 4, and so on. Because 
the set of possible outcomes 


{6.4,3,3.2..4 


can be listed, x is a discrete random variable. You can represent its values as 
points on a number line. 


Number of Calls (Discrete) 


<0 1 IH 
0 1 2 3 4 =5 6 #7 8 9 10 


x can have only whole number values: 0, 1, 2, 3,.... 


A different way to conduct the study would be to measure the time (in hours) a 
telemarketer spends making calls in one day. Because the time spent making 
calls can be any number from 0 to 24 (including fractions and decimals), x is a 
continuous random variable. You can represent its values with an interval on a 
number line. 


Hours Spent on Calls (Continuous) 


0 3 6 9 12 15 18 21 24 
X can have any value between 0 and 24. 


When a random variable is discrete, you can list the possible values it can assume. 
However, it is impossible to list all values for a continuous random variable. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


SECTION 4.1 PROBABILITY DISTRIBUTIONS 191 


EXAMPLE 1 


> Discrete Variables and Continuous Variables 


Decide whether the random variable x is discrete or continuous. Explain your 
reasoning. 


1. Let x represent the number of Fortune 500 companies that lost money in 
the previous year. 


2. Let x represent the volume of gasoline in a 21-gallon tank. 


> Solution 
STUDY TIP 
In most practical applications, 
discrete random variables 
represent counted data, 
while continuous random 
variables represent 
measured data. 


1. The number of companies that lost money in the previous year can be 
counted. 


{0,1,2,3,..., 500} 


So, x is a discrete random variable. 


2. The amount of gasoline in the tank can be any volume between 0 gallons 
and 21 gallons. So, x is a continuous random variable. 


> Try It Yourself 1 


Decide whether the random variable x is discrete or continuous. Explain your 
reasoning. 


1. Let x represent the speed of a Space Shuttle. 
2. Let x represent the number of calves born on a farm in one year. 


a. Decide if x represents counted data or measured data. 
b. Make a conclusion and explain your reasoning. Answer: Page A36 


It is important that you can distinguish between discrete and continuous 


INSIGHT random variables because different statistical techniques are used to analyze each. 
Values of variables such as The remainder of this chapter focuses on discrete random variables and their 
age, height, and weight probability distributions. You will study continuous distributions later. 

are usually rounded to 

the nearest year, inch, : 

oe > DISCRETE PROBABILITY DISTRIBUTIONS 

these values represent Each value of a discrete random variable can be assigned a probability. By listing 
measured data, so they each value of the random variable with its corresponding probability, you are 
are continuous random forming a discrete probability distribution. 

variables. 


DEFINITION 


A discrete probability distribution lists each possible value the random 
variable can assume, together with its probability. A discrete probability 
distribution must satisfy the following conditions. 


IN WORDS IN SYMBOLS 
1. The probability of each value of the discrete 0 = P(x) =1 
random variable is between 0 and 1, inclusive. 
2. The sum of all the probabilities is 1. SAGs) = i 


Because probabilities represent relative frequencies, a discrete probability 
distribution can be graphed with a relative frequency histogram. 
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Frequency Distribution 


nF WN FR 


Passive-Aggressive Traits 


P(x) 


Probability 


Frequency Distribution 


NYDN fF WN FP CO 


CHAPTER 4 


24 
33) 
42 
30 
21 


16 
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DISCRETE PROBABILITY DISTRIBUTIONS 


GUIDELINES 


Constructing a Discrete Probability Distribution 

Let x be a discrete random variable with possible outcomes x,, X7, ...,Xp- 
1. Make a frequency distribution for the possible outcomes. 

2. Find the sum of the frequencies. 


3. Find the probability of each possible outcome by dividing its frequency by 
the sum of the frequencies. 


4. Check that each probability is between 0 and 1, inclusive, and that the 
sum of all the probabilities is 1. 


EXAMPLE 2 G@® Report 16 


> Constructing and Graphing a Discrete Probability Distribution 


An industrial psychologist administered a personality inventory test for 
passive-aggressive traits to 150 employees. Each individual was given a score 
from 1 to 5, where 1 was extremely passive and 5 extremely aggressive. A score 
of 3 indicated neither trait. The results are shown at the left. Construct a 
probability distribution for the random variable x. Then graph the distribution 
using a histogram. 


> Solution 


Divide the frequency of each score by the total number of individuals in the 
study to find the probability for each value of the random variable. 


24 33 42 

PU) =a = 016 PQ) = 424 022 P(3)'= 75, = 028 
30 21 

P(A) = Fay = 0.20 PG)=a.47> 0" 


The discrete probability distribution is shown in the following table. 


1 2 3 4 5 Note that 0 = P(x) =1 


| Pe) 0.16 | 0.22 | 0.28 | 0.20 | 0.14 and }P(x) = 1. 


The histogram is shown at the left. Because the width of each bar is one, the 
area of each bar is equal to the probability of a particular outcome. Also, the 
probability of an event corresponds to the sum of the areas of the outcomes 
included in the event. For instance, the probability of the event “having a score 
of 2 or 3” is equal to the sum of the areas of the second and third bars, 


(1)(0.22) + (1)(0.28) = 0.22 + 0.28 = 0.50. 
Interpretation You can see that the distribution is approximately symmetric. 


> Try It Yourself 2 

A company tracks the number of sales new employees make each day during 
a 100-day probationary period. The results for one new employee are shown at 
the left. Construct and graph a probability distribution. 


a. Find the probability of each outcome. 
b. Organize the probabilities in a probability distribution. 
c. Graph the probability distribution using a histogram. Answer: Page A36 
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Probability Distribution 
EXAMPLE 3 

eee > Verifying Probability Distributions 

0 0.216 Verify that the distribution at the left (see page 189) is a probability distribution. 

; nes > Solution 

. nae If the distribution is a probability distribution, then (1) each probability is 

3 0.064 between 0 and 1, inclusive, and (2) the sum of the probabilities equals 1. 


1. Each probability is between 0 and 1. 
2. > P(x) = 0.216 + 0.432 + 0.288 + 0.064 


= 1. 
In a recent year in the United Interpretation Because both conditions are met, the distribution is a 
States, nearly 11 million traffic probability distribution. 
accidents were reported to the 
police. A histogram of traffic > Try It Yourself 3 


accidents for various age 
groups from 16 to 84 is shown. 
(Adapted from National Safety Council) 


Verify that the distribution you constructed in Try It Yourself 2 is a probability 
distribution. 


a. Verify that the probability of each outcome is between 0 and 1, inclusive. 
b. Verify that the sum of all the probabilities is 1. 
ce. Make a conclusion. Answer: Page A36 


US. Traffic Accidents by Age 
P(x) 
A 
0.30 
0.25 +- 
0.20 


ee 
0.10- EXAMPLE 4 


= 
| 
3 
3 
a 


0.05 + 
af > Identifying Probability Distributions 
i Decide whether the distribution is a probability distribution. Explain your 
ee reasoning. 
1. 5 6 7 8 2. 1] 2/3 4 


Estimate the probability that 
a randomly selected person 0.28 | 0.21 | 0.43 | 0.15 i) 218 | oy 
involved in a traffic accident 

is in the 16 to 34 age group. 


> Solution 


1. Each probability is between 0 and 1, but the sum of all the probabilities is 
1.07, which is greater than 1. So, it is not a probability distribution. 


2. The sum of all the probabilities is equal to 1, but P(3) and P(4) are not 
between 0 and 1. So, it is not a probability distribution. Probabilities can 
never be negative or greater than 1. 

> Try It Yourself 4 

Decide whether the distribution is a probability distribution. Explain your 


reasoning. 

1. Ea 5 6 7 8 2. if 2 3 4 
1 1 

| Pe) | ee) 0.09 0.36 0.49 0.06 


a. Verify that the probability of each outcome is between 0 and 1. 
b. Verify that the sum of all the probabilities is 1. 
ce. Make a conclusion. Answer: Page A36 
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> MEAN, VARIANCE, AND STANDARD DEVIATION 


You can measure the center of a probability distribution with its mean and 
measure the variability with its variance and standard deviation. The mean of a 
discrete random variable is defined as follows. 


MEAN OF A DISCRETE RANDOM VARIABLE 


The mean of a discrete random variable is given by 
p= YxP(x). 


Each value of x is multiplied by its corresponding probability and the products 
are added. 


The mean of a random variable represents the “theoretical average” of 
a probability experiment and sometimes is not a possible outcome. If the 
experiment were performed many thousands of times, the mean of all the 
outcomes would be close to the mean of the random variable. 


ne 


1 0.16 
». || 988 > Finding the Mean of a Probability Distribution 
3 0.28 The probability distribution for the personality inventory test for passive- 
; aggressive traits discussed in Example 2 is given at the left. Find the mean 
= Oe") score. What can you conclude? 
5 0.14 
> Solution 
Use a table to organize your work, as shown below. From the table, you can see 
that the mean score is approximately 2.9. A score of 3 represents an individual 
who exhibits neither passive nor aggressive traits. The mean is slightly under 3. 
1 0.16 1(0.16) = 0.16 
2 0.22 2(0.22) = 0.44 
3 0.28 3(0.28) = 0.84 
STUDY TIP 4 0.20 4(0.20) = 0.80 
Notice that the mean in Example 5 5 0.14 5(0.14) = 0.70 


is rounded to one decimal place. 
This rounding was done because 
the mean of a probability 
distribution should be 
rounded to one more 
decimal place than was 
used for the random 
variable x. This round-off 
rule is also used for the 
variance and standard 
deviation of a probability 
distribution. 


P(x) =1  YxP(x) = 2.94 ~—— Mean 


Interpretation You can conclude that the mean personality trait is neither 
extremely passive nor extremely aggressive, but is slightly closer to passive. 


> Try It Yourself 5 
Find the mean of the probability distribution you constructed in Try It Yourself 


2. What can you conclude? 


a. Find the product of each random outcome and its corresponding probability. 
b. Find the sum of the products. 
c. Make a conclusion. Answer: Page A36 
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Although the mean of the random variable of a probability distribution 
describes a typical outcome, it gives no information about how the outcomes 
vary. To study the variation of the outcomes, you can use the variance and 
standard deviation of the random variable of a probability distribution. 


VARIANCE AND STANDARD DEVIATION 
OF A DISCRETE RANDOM VARIABLE 
The variance of a discrete random variable is 

C= in) Po) 


The standard deviation is 


og Ver = VSG) Po). 


EXAMPLE 6 


STUDY TIP 


A shortcut formula 
for the variance of a 
probability distribution is 


c= Dx x)) ae 


[ao > Finding the Variance and Standard Deviation 
1 0.16 The probability distribution for the personality inventory test for passive- 
2 | 0.22 aggressive traits discussed in Example 2 is given at the left. Find the variance 
3 028 and standard deviation of the probability distribution. 
4 0.20 > Solution 
5 | 0.14 From Example 5, you know that before rounding, the mean of the distribution 


is « = 2.94. Use a table to organize your work, as shown below. 


STUDY TIP Ee Nee Pa Cee cae 


Detailed instructions for using 1 0.16 —1.94 3.764 0.602 
Plus are shown in the Technology i 
Guide that accompanies this text. 3 0.28 0.06 0.004 0.001 
Here are instructions for finding 4 0.20 1.06 1.124 0.225 
the mean and standard deviation 5 0.14 2.06 4.244 0.594 
of a discrete random variable of 5 
a probability distribution on a ZP(x) = 1 2P(x)(x — wy" = 1.616 *) 
TI-83/84 Plus. ; 
: : Variance 
STAT So, the variance is 
Choose the EDIT menu. ao” = 1.616 ~ 1.6 
1: Edit and the standard deviation is 
E h ibl 
nter the possible values of the gh AI Ale eS 


discrete random variable x in L1. 
Enter the probabilities:°(s) trit2: Interpretation Most of the data values differ from the mean by no more 
STAT than 1.3. 
Choose the CALC menu. 
> Try It Yourself 6 


1: 1-Var Stats , : ae Sa ee re 
Find the variance and standard deviation of the probability distribution 
ENE S constructed in Try It Yourself 2. 


2nd > end a. For each value of x, find the square of the deviation from the mean and 
ENTER multiply that value by the corresponding probability of x. 

b. Find the sum of the products found in part (a) for the variance. 

c. Take the square root of the variance to find the standard deviation. 

d. Interpret the results. Answer: Page A36 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


196 CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS 


INSIGHT 


In most applications, an expected 
value of 0 has a practical 
interpretation. For instance, 
in games of chance, an 
expected value of 0 implies 
that a game is fair (an 
unlikely occurrence!). 
In a profit and loss 
analysis, an expected 
value of 0 represents 
the break-even point. 


>» EXPECTED VALUE 


The mean of a random variable represents what you would expect to happen 
over thousands of trials. It is also called the expected value. 


DEFINITION 


The expected value of a discrete random variable is equal to the mean of the 
random variable. 


Expected Value = E(x) = p = > xP(x) 


Although probabilities can never be negative, the expected value of a random 
variable can be negative. 


EXAMPLE 7 


» Finding an Expected Value 


At a raffle, 1500 tickets are sold at $2 each for four prizes of $500, $250, $150, 
and $75. You buy one ticket. What is the expected value of your gain? 


> Solution 


To find the gain for each prize, subtract the price of the ticket from the prize. 
For instance, your gain for the $500 prize is 


$500 — $2 = $498 
and your gain for the $250 prize is 
$250 — $2 = $248. 


Write a probability distribution for the possible gains (or outcomes). 


$498 $248 $148 $73 —$2 


1 1 1 1 1496 
1500 1500 1500 1500 1500 


Then, using the probability distribution, you can find the expected value. 
E(x) = >xP(x) 


1 1 1 1 1496 
= $498 599 + $248 Te 99 + $148 F509 + $73 F509 + (82) “F500 


= —$1.35 


Interpretation Because the expected value is negative, you can expect to lose 
an average of $1.35 for each ticket you buy. 


> Try It Yourself 7 


Ata raffle, 2000 tickets are sold at $5 each for five prizes of $2000, $1000, $500, 
$250, and $100. You buy one ticket. What is the expected value of your gain? 


a. Find the gain for each prize. 

b. Write a probability distribution for the possible gains. 

c. Find the expected value. 

d. Interpret the results. Answer: Page A36 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What is a random variable? Give an example of a discrete random variable 
and a continuous random variable. Justify your answer. 


2. What is a discrete probability distribution? What are the two conditions that 
determine a probability distribution? 


3. Is the expected value of the probability distribution of a random variable 
always one of the possible values of x? Explain. 


4. What is the significance of the mean of a probability distribution? 


True or False? In Exercises 5-8, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. In most applications, continuous random variables represent counted data, 
while discrete random variables represent measured data. 


6. For a random variable x, the word random indicates that the value of x is 
determined by chance. 


7. The mean of a random variable represents the “theoretical average” of a 
probability experiment and sometimes is not a possible outcome. 


8. The expected value of a discrete random variable is equal to the standard 
deviation of the random variable. 


Graphical Analysis Jn Exercises 9-12, decide whether the graph represents a 
discrete random variable or a continuous random variable. Explain your reasoning. 


9, The attendance at concerts for 10. The length of time student-athletes 
a rock group practice each week 
=<— |-@—o—e—|-e—|@-@ | —> 
40,000 45,000 50,000 0 4 8 12 16 20 
11. The distance a baseball travels 12. The annual traffic fatalities in the 
after being hit United States (Source: U.S. National 
i, me ave ae rea Highway Traffic Safety Administration) 


=<—|e-e—_|—-ee—_|-e {+—> 


37,000 38,000 39,000 40,000 


Distinguishing Between Discrete and Continuous Random Variables 
In Exercises 13-20, decide whether the random variable x is discrete or continuous. 
Explain your reasoning. 


13. Let x represent the number of books in a university library. 

14. Let x represent the length of time it takes to get to work. 

15. Let x represent the volume of blood drawn for a blood test. 

16. Let x represent the number of tornadoes in the month of June in Oklahoma. 


17. Let x represent the number of messages posted each month on a social 
networking website. 


18. Let x represent the tension at which a randomly selected guitar’s strings have 
been strung. 


19. Let x represent the amount of snow (in inches) that fell in Nome, Alaska 
last winter. 


20. Let x represent the total number of die rolls required for an individual to roll 
a five. 
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P(x) 


Probability 


Test scores 


FIGURE FOR EXERCISE 21 


Probability 


Number of donations 


FIGURE FOR EXERCISE 22 


M@ USING AND INTERPRETING CONCEPTS 


21. Employee Testing A company gave psychological tests to prospective 
employees. The random variable x represents the possible test scores. Use the 
histogram to find the probability that a person selected at random from the 
survey’s sample had a test score of (a) more than two and (b) less than four. 


22. Blood Donations A survey asked a sample of people how many times 
they donate blood each year. The random variable x represents the number 
of donations in one year. Use the histogram to find the probability that 
a person selected at random from the survey’s sample donated blood 
(a) more than once in a year and (b) less than three times in a year. 


Determining a Missing Probability Jn Exercises 23 and 24, determine the 
probability distribution’s missing probability value. 


.05 ? 0.23 0.21 O17 0.11 0.08 


Identifying Probability Distributions Jn Exercises 25 and 26, decide 
whether the distribution is a probability distribution. If it is not a probability 
distribution, identify the property (or properties) that are not satisfied. 


25. Tires A mechanic checked the tire pressures on each car that he worked on 
for one week. The random variable x represents the number of tires that 
were underinflated. 


0 1 2 3 4 


| Pe) 0.30 0.25 025 | 0.15 | 0.05 


26. Quality Control A quality inspector checked for imperfections in rolls 
of fabric for one week. The random variable x represents the number of 
imperfections found. 


Constructing Probability Distributions In Exercises 27-32, (a) use the 
frequency distribution to construct a probability distribution, (b) graph the 
probability distribution using a histogram and describe its shape, (c) find the mean, 
variance, and standard deviation of the probability distribution, and (d) interpret 
the results in the context of the real-life situation. 


27. Dogs The number of dogs per household in a small town 


1491 425 168 | 48 29 | 14 
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28. Baseball The number of games played in the World Series from 1903 to 
2009 (Source: Major League Baseball) 


29. Televisions The number of televisions per household in a small town 


0 1 2 3 
26 442 728 | 1404 


30. Camping Chairs The number of defects per batch of camping chairs 
inspected 


31. Overtime Hours The number of overtime hours worked in one week 
per employee 


32. Extracurricular Activities The number of school-related extracurricular 
activities per student 


1 2 3 4 5 6 7 
39 | 52 | 57 | 68 | 41 | 27 | 17 


33. Writing The expected value of an accountant’s profit and loss analysis is 0. 
Explain what this means. 


34. Writing In a game of chance, what is the relationship between a “fair bet” 
and its expected value? Explain. 


Finding Expected Value Jn Exercises 35—40, use the probability distribution 
or histogram to find the (a) mean, (b) variance, (c) standard deviation, and 
(d) expected value of the probability distribution, and (e) interpret the results. 


35. Quiz Students in a class take a quiz with eight questions. The random 
variable x represents the number of questions answered correctly. 


a 0 1 2 3 4 5 6 7 8 


0.02 0.02 0.06 0.06 0.08 0.22 0.30 0.16 | 0.08 


36. 911 Calls A 911 service center recorded the number of calls received per 
hour. The random variable x represents the number of calls per hour for one 
week. 


Ea o/i1tl2j)3t4ts5)e64t/7 


0.01 0.10 | 0.26 0.33 | 0.18 0.06 0.03 0.03 
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Size of Household 


159 0,136 
0.159 Zz 


Probability 


more 
Number of persons 


FIGURE FOR EXERCISE 39 


Carpooling 


Probability 


2 3 
Number of cars per household 


FIGURE FOR EXERCISE 40 


37. 


Hurricanes The histogram shows the distribution of hurricanes that 
have hit the U.S. mainland by category, with 1 the weakest level and 5 the 
strongest. (Source: Weather Research Center) 


Hurricanes That Have Hit Tacoma Narrows Bridge: 
P(x) the U.S. PQ) Car Occupancy 
A 
0.6>— 0.555 
_ 054 
2 = 044 
TS oO 
‘3 8 0.34 
E a 024 
0.1 
Category Occupants 
FIGURE FOR EXERCISE 37 FIGURE FOR EXERCISE 38 


38. 


40. 


41. 


42. 


43. 


AA, 


Car Occupancy The histogram shows the distribution of occupants in cars 
crossing the Tacoma Narrows Bridge in Washington each week. (Adapted 
from Washington State Department of Transportation) 


. Household Size The histogram shows the distribution of household sizes in 


the United States for a recent year. (Adapted from U.S. Census Bureau) 


Carpooling The histogram shows the distribution of carpooling by the 
number of cars per household. (Adapted from Federal Highway Administration) 


Finding Probabilities Use the probability distribution you made 
for Exercise 27 to find the probability of randomly selecting a household that 
has (a) fewer than two dogs, (b) at least one dog, and (c) between one and 
three dogs, inclusive. 


Finding Probabilities Use the probability distribution you made for 
Exercise 28 to find the probability of randomly selecting a World Series that 
consisted of (a) four games, (b) at least five games, and (c) between four and 
six games, inclusive. 


Unusual Values A person lives in a household with three dogs and claims 
that having three dogs is not unusual. Use the information in Exercise 27 to 
determine if this person is correct. Explain your reasoning. 


Unusual Values A person randomly chooses a World Series in which eight 
games were played and claims that this is an unusual event. Use the informa- 
tion in Exercise 28 to determine if this person is correct. Explain you reasoning. 


Games of Chance In Exercises 45 and 46, find the expected net gain to the 
player for one play of the game. If x is the net gain to a player in a game of chance, 
then E(x) is usually negative. This value gives the average amount per game the 
player can expect to lose. 


45. In American roulette, the wheel has the 38 numbers 


46. 


00,0, 1,2,..., 34,35, and 36 


marked on equally spaced slots. If a player bets $1 on a number and wins, 
then the player keeps the dollar and receives an additional 35 dollars. 
Otherwise, the dollar is lost. 

A charity organization is selling $5 raffle tickets as part of a fund-raising 
program. The first prize is a trip to Mexico valued at $3450, and the second 
prize is a weekend spa package valued at $750. The remaining 20 prizes are 
$25 gas cards. The number of tickets sold is 6000. 
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SC) In Exercises 47 and 48, use StatCrunch to (a) construct and graph a 
probability distribution and (b) describe its shape. 


47. Computers The number of computers per household in a small town 


48. Students The enrollments (in thousands) for grades 1 through 8 in the 
United States for a recent year (Source: U.S. National Center for Education Statistics) 


1 2 3 4 5 6 7 8 


"Enrollment _ 3750 | 3640 | 3627 | 3585 | 3601 | 3660 | 3715 | 3765 


mM EXTENDING CONCEPTS 


Linear Transformation of a Random Variable Jn Exercises 49 and 50, use 
the following information. For a random variable x, a new random variable y can be 
created by applying a linear transformation y = a + bx, where a and b are con- 
stants. If the random variable x has mean yw, and standard deviation o-,, then the 
mean, variance, and standard deviation of y are given by the following formulas. 


_ 2 _ p22, 2 _ 
fly = a+ bp, yo, Oy = |blo-, 


49. The mean annual salary of employees at a company is $36,000. At the end of 
the year, each employee receives a $1000 bonus and a 5% raise (based on 
salary). What is the new mean annual salary (including the bonus and raise) 
of the employees? 


50. The mean annual salary of employees at a company is $36,000 with a variance 
of 15,202,201. At the end of the year, each employee receives a $2000 bonus and 
a4% raise (based on salary). What is the standard deviation of the new salaries? 


Independent and Dependent Random Variables Two random 
variables x and y are independent if the value of x does not affect the value of y. If 
the variables are not independent, they are dependent. A new random variable can 
be formed by finding the sum or difference of random variables. If a random 
variable x has mean a, and a random variable y has mean py, then the means of 
the sum and difference of the variables are given by the following equations. 


Mety = My + My My-y = My — By 


If random variables are independent, then the variance and standard deviation 
of the sum or difference of the random variables can be found. So, if a random 
variable x has variance 07, and a random variable y has variance Gs then the 
variances of the sum and difference of the variables are given by the following 
equations. Note that the variance of the difference is the sum of the variances. 
Oey = Oo’, + oy Oey = a, om a; 

In Exercises 51 and 52, the distribution of SAT scores for college-bound male 
seniors has a mean of 1524 and a standard deviation of 317. The distribution of 
SAT scores for college-bound female seniors has a mean of 1496 and a standard 
deviation of 307. One male and one female are randomly selected. Assume their 
scores are independent. (Source: The College Board) 


51. What is the average sum of their scores? What is the average difference of 
their scores? 


52. What is the standard deviation of the difference in their scores? 
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WHAT YOU SHOULD LEARN 


Vv 


vv 


Ss 


4 
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CHAPTER 4 


How to determine if a 
probability experiment is 
a binomial experiment 


How to find binomial 
probabilities using the 
binomial probability formula 


How to find binomial 
probabilities using technology, 
formulas, and a binomial 
probability table 


How to graph a binomial 
distribution 


How to find the mean, 
variance, and standard 
deviation of a binomial 
probability distribution 


Trial Outcome  S or F? 


ate F 
vies 
OL 
x 
ate (S) 
+ = 
vy 
aa F 
a a4 There are two 
successful outcomes. 
+¢ 4 | So,x=2. 
+ F 
oe 


©) 
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Binomial Distributions 


Binomial Experiments >» Binomial Probability Formula > Finding Binomial 
Probabilities » Graphing Binomial Distributions » Mean, Variance, and 
Standard Deviation 


> BINOMIAL EXPERIMENTS 


There are many probability experiments for which the results of each trial can 
be reduced to two outcomes: success and failure. For instance, when a basketball 
player attempts a free throw, he or she either makes the basket or does not. 
Probability experiments such as these are called binomial experiments. 


DEFINITION 


A binomial experiment is a probability experiment that satisfies the following 
conditions. 


1. The experiment is repeated for a fixed number of trials, where each trial is 
independent of the other trials. 


2. There are only two possible outcomes of interest for each trial. The 
outcomes can be classified as a success (8) or as a failure (F). 


3. The probability of a success P(S) is the same for each trial. 
4. The random variable x counts the number of successful trials. 


NOTATION FOR BINOMIAL EXPERIMENTS 


SYMBOL DESCRIPTION 

n The number of times a trial is repeated 

p = P(S) The probability of success in a single trial 

q= P(F) The probability of failure in a single trial (¢ = 1 — p) 

XG The random variable represents a count of the number of 


successes in 7 trials: x = 0, 1,2,3,...,n. 


Here is a simple example of a binomial experiment. From a standard deck of 
cards, you pick a card, note whether it is a club or not, and replace the card. You 
repeat the experiment five times, so n = 5. The outcomes of each trial can be 
classified in two categories: S = selecting a club and F = selecting another suit. 
The probabilities of success and failure are 

1 3 

p=P(S)=7 and q=P(F)=—-. 
The random variable x represents the number of clubs selected in the five trials. 
So, the possible values of the random variable are 


0,1,2,3, 4, and 5. 


For instance, if x = 2, then exactly two of the five cards are clubs and the other 
three are not clubs. An example of an experiment with x = 2 is shown at the left. 
Note that x is a discrete random variable because its possible values can be listed. 
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In a recent survey of U.S. adults 
who used the social networking 
website Twitter were asked if 
they had ever posted comments 
about their personal lives. The 
respondents’ answers were either 
yes or no. (Adapted from Zogby 
International) 

Survey question: Have you ever 
posted comments about your 
personal life on Twitter? 


Why is this a binomial 
experiment? Identify the 
probability of success p. 
Identify the probability 


of failure q. 
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EXAMPLE 1 


> Identifying and Understanding Binomial Experiments 

Decide whether the experiment is a binomial experiment. If it is, specify the 
values of n, p, and q, and list the possible values of the random variable x. If it 
is not, explain why. 


1. 


A certain surgical procedure has an 85% chance of success. A doctor 
performs the procedure on eight patients. The random variable represents 
the number of successful surgeries. 


. A jar contains five red marbles, nine blue marbles, and six green marbles. 


You randomly select three marbles from the jar, without replacement. The 
random variable represents the number of red marbles. 


> Solution 


1 


The experiment is a binomial experiment because it satisfies the four 
conditions of a binomial experiment. In the experiment, each surgery 
represents one trial. There are eight surgeries, and each surgery is 
independent of the others. There are only two possible outcomes for each 
surgery—either the surgery is a success or it is a failure. Also, the 
probability of success for each surgery is 0.85. Finally, the random variable 
x represents the number of successful surgeries. 


n=8 

p = 0.85 

q=1- 0.85 
= 0.15 


x = 0,1,2,3,4,5,6,7,8 


. The experiment is not a binomial experiment because it does not satisfy all 


four conditions of a binomial experiment. In the experiment, each marble 
selection represents one trial, and selecting a red marble is a success. When 
the first marble is selected, the probability of success is 5/20. However, 
because the marble is not replaced, the probability of success for 
subsequent trials is no longer 5/20. So, the trials are not independent, and 
the probability of a success is not the same for each trial. 


> Try It Yourself 1 

Decide whether the following is a binomial experiment. If it is, specify the 
values of n, p, and q, and list the possible values of the random variable x. If it 
is not, explain why. 


You take a multiple-choice quiz that consists of 10 questions. Each 
question has four possible answers, only one of which is correct. To 
complete the quiz, you randomly guess the answer to each question. 
The random variable represents the number of correct answers. 


. Identify a trial of the experiment and what is a success. 
. Decide if the experiment satisfies the four conditions of a binomial 


experiment. 
Make a conclusion and identify n, p, g, and the possible values of x, if 
possible. Answer: Page A36 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


204 CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS 


> BINOMIAL PROBABILITY FORMULA 


There are several ways to find the probability of x successes in n trials of a 
binomial experiment. One way is to use a tree diagram and the Multiplication 
Rule. Another way is to use the binomial probability formula. 


BINOMIAL PROBABILITY FORMULA 


In a binomial experiment, the probability of exactly x successes in n trials is 


INSIGHT 


In the binomial probability 
formula, ,C, determines 
the number of ways 

of getting x successes 

in rn trials, regardless 

of order. 


n! 
— AEST 6 lear See, ot oe 
Ded os Cg oN Caan oS 


EXAMPLE 2 G® Report 17 


> Finding Binomial Probabilities 


Microfracture knee surgery has a 75% chance of success on patients with 

degenerative knees. The surgery is performed on three patients. Find the 

probability of the surgery being successful on exactly two patients. (Source: 
STUDY TIP Illinois Sportsmedicine and Orthopedic Center) 


Recall that n! is read 


; > Solution Method 1: Draw a tree diagram and use the Multiplication Rule. 
“n factorial” and 


represents the product {st 2nd 3rd Number of 

of all integers from n Surgery Surgery Surgery Outcome Successes Probability 
to 1. For instance, —— 6 SSS 3 3.3.3 = Z 
51 = 5-4-3-2-1 sS— nes eee 
WF SSF 2 Aaa 6s 

= 120. a S Silanes 9 

L_F SFF 1 +TT=a 

rae ae S FSS 2 piaeta 

7 W— F FSF 1 rita 

ee Ss FFS 1 pres 

LF FEF 0 bbted 


There are three outcomes that have exactly two successes, and each has a 
probability of ei. So, the probability of a successful surgery on exactly two 


patients is 3(&) = 0.422. 

Method 2: Use the binomial probability formula. 

In this binomial experiment, the values of n, p, g, and x are n = 3, p = ;, 
q= i, and x = 2. The probability of exactly two successful surgeries is 


. a chee 
P(2 successful surgeries) = (3 —2)12!\4/ \4 


9 \f1 9 27 
~(2)(2) =3(2) = 700 
> Try It Yourself 2 


A card is selected from a standard deck and replaced. This experiment is 
repeated a total of five times. Find the probability of selecting exactly three clubs. 


a. Identify a trial, a success, and a failure. 
b. Identify n, p, q, and x. 
c. Use the binomial probability formula. Answer: Page A36 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


SECTION 4.2 
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By listing the possible values of x with the corresponding probabilities, you 


can construct a binomial probability distribution. 


EXAMPLE 3 


> Constructing a Binomial Distribution 


yan 


Quick? anind 


(Source: GfK Roper for Best Buy Mobile) 


NAN WNFR SO 
o 
N 
Ne) 
es) 
N 


STUDY TIP 


When probabilities are 

rounded to a fixed number 

of decimal places, the sum 

of the probabilities may L 
differ slightly from 1. 


> Solution 


In a survey, U.S. adults were 
asked to give reasons why they 
liked texting on their cellular 
phones. The results are shown 
in the graph. Seven adults who 
participated in the survey are 
randomly selected and asked 
whether they like texting 
because it is quicker than 
calling. Create a binomial 
probability distribution for the 
number of adults who respond 
yes. 


From the graph, you can see that 56% of adults like texting because it is quicker 
than calling. So, p = 0.56 and q = 0.44. Because n = 7, the possible values 


of x are 0, 1, 2,3, 4,5, 6, and 7. 


P(0) = 7Co(0.56)°(0.44)’ = 1(0.56)°(0.44)’ = 0.0032 


( 
= 7C,(0.56)'(0.44)° = 7(0.56)1(0.44)° = 0.0284 


) 

P(1) (0.56) (0.44) 
P(2) = 7C2(0.56)*(0.44)> = 21(0.56)7(0.44)> = 0.1086 
P(3) = 7C3(0.56)*(0.44)* = 35(0.56)3(0.44)4 ~ 0.2304 
P(4) = 7C4(0.56)*(0.44)> = 35(0.56)*(0.44)? = 0.2932 
P(5) = 7C;(0.56)°(0.44)? = 21(0.56)>(0.44)? ~ 0.2239 
P(6) = 7C¢(0.56)°(0.44)! = 7(0.56)°(0.44)! ~ 0.0950 

(7) ) ) 


= 7C7(0.56)’(0.44)° = 1(0.56)’(0.44)° = 0.0173 


Notice in the table at the left that all the probabilities are between 0 and 1 and 


that the sum of the probabilities is 1. 
> Try It Yourself 3 


Seven adults who participated in the survey are randomly selected and asked 
whether they like texting because it works where talking won’t do. Create a 
binomial distribution for the number of adults who respond yes. 


. Identify a trial, a success, and a failure. 
. Identify n, p,q, and possible values for x. 


7 


satisfied. 
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. Use the binomial probability formula for each value of x. 
. Use a table to show that the properties of a probability distribution are 


Answer: Page A37 
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STUDY TIP 


Here are instructions for finding 
a binomial probability on 
a TI-83/84 Plus. 


DISTR 
0: binompdf( 


Enter the values of n, 
p, and x separated 
by commas. 


ENTER 


STUDY TIP 


Recall that if a probability 
is 0.05 or less, it is 
typically considered 
unusual. 


> FINDING BINOMIAL PROBABILITIES 


In Examples 2 and 3, you used the binomial probability formula to find the 
probabilities. A more efficient way to find binomial probabilities is to use a 
calculator or a computer. For instance, you can find binomial probabilities using 
MINITAB, Excel, and the TI-83/84 Plus. 


EXAMPLE 4 


> Finding a Binomial Probability Using Technology 


The results of a recent survey indicate that 67% of U.S. adults consider air 
conditioning a necessity. If you randomly select 100 adults, what is the 
probability that exactly 75 adults consider air conditioning a necessity? Use a 
technology tool to find the probability. (Source: Opinion Research Corporation) 


> Solution 

MINITAB, Excel, and the TI-83/84 Plus each have features that allow you to 
find binomial probabilities automatically. Try using these technologies. You 
should obtain results similar to the following. 


MINITAB TI-83/84 PLUS 


Probability Distribution Function evinle\in ele TUR Z22 
(0201004116 


Binomial with n = 100 and p = 0.67 


Xx P(X=x] 
73 0.0201004 
A B Cc D 
BINOMDIST(75, 100,0.67,FALSE) 
2 0.020100412 


Interpretation From these displays, you can see that the probability that 
exactly 75 adults consider air conditioning a necessity is about 0.02. Because 
0.02 is less than 0.05, this can be considered an unusual event. 


> Try It Yourself 4 


The results of a recent survey indicate that 71% of people in the United States 
use more than one topping on their hot dogs. If you randomly select 250 
people, what is the probability that exactly 178 of them will use more than one 
topping? Use a technology tool to find the probability. (Source: ICR Survey 
Research Group for Hebrew International) 


a. Identify n, p, and x. 

b. Calculate the binomial probability. 

c. Interpret the results. 

d. Determine if the event is unusual. Explain. Answer: Page A37 
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EXAMPLE 5 [MRSC eegae 


> Finding Binomial Probabilities Using Formulas 


A survey indicates that 41% of women in the United States consider reading 
binomedft4,.41,2 their favorite leisure-time activity. You randomly select four U.S. women and 
ask them if reading is their favorite leisure-time activity. Find the probability 

»So1B9S66 
that (1) exactly two of them respond yes, (2) at least two of them respond yes, 
and (3) fewer than two of them respond yes. (Source: Louis Harris & Associates) 


> Solution 


1. Using n = 4, p = 0.41, gq = 0.59, and x = 2, the probability that exactly 


Pee 2 Tes Eyer eat two women will respond yes is 


find the probability in part (1) 
automatically. P(2) = 4C(0.41)?(0.59)? = 6(0.41)7(0.59)? ~ 0.351. 


2. To find the probability that at least two women will respond yes, find the 
sum of P(2), P(3), and P(4). 


P(2) = 4C(0.41)?(0.59)? = 6(0.41)7(0.59)? ~ 0.351094 
P(3) = 4C3(0.41)3(0.59)! = 4(0.41)°(0.59)! ~ 0.162654 
P(4) = 4C,(0.41)4(0.59)° = 1(0.41)*(0.59)° = 0.028258 


2 


STUDY TIP 


The complement of “x is at 
least 2” is “x is less than 2.” 
So, another way to find the 
probability in part (3) is 


So, the probability that at least two will respond yes is 
P(x = 2) = P(2) + P(3) + P(A) 
= 0.351094 + 0.162654 + 0.028258 
= 0.542. 


Pix = 2) = 1 — Pixs 2) 
1 — 0.542 
0.458. 


v 


3. To find the probability that fewer than two women will respond yes, find the 
sum of P(0) and P(1). 


P(0) = 4C(0.41)°(0.59)* = 1(0.41)°(0.59)* ~ 0.121174 
po eieeinges ~41;1 P(1) = 4C,(0.41)1(0.59)> = 4(0.41)1(0.59)? ~ 0.336822 


245799517 So, the probability that fewer than two will respond yes is 
P(x <2) = P(0) + P(1) 

~ 0.121174 + 0.336822 

~ 0.458. 


The cumulative distribution function 
(CDF) computes the probability of > Try It Yourself 5 
“x or fewer” successes. The CDF 
adds the areas for the given x-value 
and all those to its left. 


A survey indicates that 21% of men in the United States consider fishing their 
favorite leisure-time activity. You randomly select five U.S. men and ask them 
if fishing is their favorite leisure-time activity. Find the probability that 
(1) exactly two of them respond yes, (2) at least two of them respond yes, and 
(3) fewer than two of them respond yes. (Source: Louis Harris & Associates) 


a. Determine the appropriate value of x for each situation. 

b. Find the binomial probability for each value of x. Then find the sum, if 
necessary. 

c. Write the result as a sentence. Answer: Page A37 
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To explore this topic further, 
see Activity 4.2 on page 216. 


Finding binomial probabilities with the binomial probability formula can 
be a tedious process. To make this process easier, you can use a binomial 
probability table. Table 2 in Appendix B lists the binomial probabilities for 
selected values of n and p. 


EXAMPLE 6 


» Finding a Binomial Probability Using a Table 


About ten percent of workers (16 years and over) in the United States 
commute to their jobs by carpooling. You randomly select eight workers. What 
is the probability that exactly four of them carpool to work? Use a table to find 
the probability. (Source: American Community Survey) 


> Solution 

A portion of Table 2 in Appendix B is shown here. Using the distribution for 
n = 8 and p = 0.1, you can find the probability that x = 4, as shown by the 
highlighted areas in the table. 


p 
n x| .01 20. .25 30 35 .40 45 .50 55 .60 
2 0} 980 640 563 490 423 .360 .303 .250 .203 .160 
1] .020 320 375 420 455 480 495 .500 495 .480 
2} .000 040 063 090 .123 .160 .203 .250 .303 360 
3 0} 970 512 422 343 275 216 .166 .125 .091 .064 
1] .029 384 422 441 444 432 408 .375 .334 .288 
2} .000 096 .141 .189 .239 .288 .334 .375 408 .432 
3} .000 008 .016 027 043 .064 .091 .125 .166 .216 
8 168 .100 058 .032 .017 .008 .004 .002 .001 
336 .267 .198 .137 .090 .055 .031 .016 .008 
294 311 .296 .259 .209 .157 .109 .070 .041 
147.208 .254 .279 279 .257 .219 .172 .124 
| 4/0 046 .087 .136 .188 .232 .263 .273 .263 .232 
5) 009 .023 .047 081 .124 .172 .219 .257 .279 
6] . 001 .004 .010 022 .041 .070 .109 .157 .209 
ls ; 000 .000 .001 .003 .008 .016 .031 .055 .090 
8| 000 .000 .000 .000 .000 .000 .000 .000 .001 .002 .004 .008 .017 


Interpretation So, the probability that exactly four of the eight workers 
carpool to work is 0.005. Because 0.005 is less than 0.05, this can be considered 
an unusual event. 


> Try It Yourself 6 


About fifty-five percent of all small businesses in the United States have a 
website. If you randomly select 10 small businesses, what is the probability that 
exactly four of them have a website? Use a table to find the probability. 
(Adapted from Webvisible/Nielsen Online) 


a. Identify a trial, a success, and a failure. 

b. Identify n, p, and x. 

c. Use Table 2 in Appendix B to find the binomial probability. 

d. Interpret the results. 

e. Determine if the event is unusual. Explain. Answer: Page A37 
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> GRAPHING BINOMIAL DISTRIBUTIONS 


In Section 4.1, you learned how to graph discrete probability distributions. 
Because a binomial distribution is a discrete probability distribution, you can use 
the same process. 


EXAMPLE 7 


> Graphing a Binomial Distribution 

Sixty percent of households in the United States own a video game 
console. You randomly select six households and ask them if they own a video 
game console. Construct a probability distribution for the random variable x. 
Then graph the distribution. (Source: Deloitte LLP) 


> Solution 


To construct the binomial distribution, find the probability for each value of x. 
Using n = 6, p = 0.6, and gq = 0.4, you can obtain the following. 


0 1 2 3 4 > 6 
0.004 0.037 | 0.138 | 0.276 | 0.311 0.187 | 0.047 


You can graph the probability distribution using a histogram as shown below. 


Owning a Video Game Console 


Probability 


Households 


Interpretation From the histogram, you can see that it would be unusual if 
none, only one, or all six of the households owned a video game console 
because of the low probabilities. 


> Try It Yourself 7 

Eighty-one percent of households in the United States own a computer. You 
randomly select four households and ask if they own a computer. Construct 
a probability distribution for the random variable x. Then graph the 
distribution. (Source: Nielsen) 


a. Find the binomial probability for each value of the random variable x. 

b. Organize the values of x and their corresponding probabilities in a table. 
c. Use a histogram to graph the binomial distribution. Then describe its shape. 
d. Are any of the events unusual? Explain. Answer: Page A37 


Notice in Example 7 that the histogram is skewed left. The graph of a 
binomial distribution with p > 0.5 is skewed left, whereas the graph of a 
binomial distribution with p < 0.5 is skewed right. The graph of a binomial 
distribution with p = 0.5 is symmetric. 
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>» MEAN, VARIANCE, AND STANDARD DEVIATION 
Although you can use the formulas you learned in Section 4.1 for mean, variance, 
and standard deviation of a discrete probability distribution, the properties of a 
binomial distribution enable you to use much simpler formulas. 


POPULATION PARAMETERS OF A BINOMIAL 
DISTRIBUTION 


Mean: pu = np 
Variance: o? = npq 


Standard deviation: o = Vnpq 


EXAMPLE 8 


> Finding and Interpreting Mean, Variance, 

and Standard Deviation 
In Pittsburgh, Pennsylvania, about 56% of the days in a year are cloudy. Find 
the mean, variance, and standard deviation for the number of cloudy days 
during the month of June. Interpret the results and determine any unusual 
values. (Source: National Climatic Data Center) 


> Solution 


There are 30 days in June. Using n = 30, p = 0.56, and q = 0.44, you can find 
the mean, variance, and standard deviation as shown below. 


uw = np = 30-0.56 


= 168 
ao” = npq = 30°0.56° 0.44 
= 7A 
oa = Vapgq = V30°-0.56+ 0.44 
~ 2.7 


Interpretation On average, there are 16.8 cloudy days during the month 
of June. The standard deviation is about 2.7 days. Values that are more than 
two standard deviations from the mean are considered unusual. Because 
16.8 — 2(2.7) = 11.4, a June with 11 cloudy days or less would be unusual. 
Similarly, because 16.8 + 2(2.7) = 22.2, a June with 23 cloudy days or more 
would also be unusual. 


> Try It Yourself 8 


In San Francisco, California, 44% of the days in a year are clear. Find the 
mean, variance, and standard deviation for the number of clear days during the 
month of May. Interpret the results and determine any unusual events. 
(Source: National Climatic Data Center) 


. Identify a success and the values of n, p, and q. 

. Find the product of n and p to calculate the mean. 

. Find the product of n, p, and q for the variance. 

. Find the square root of the variance to find the standard deviation. 

. Interpret the results. 

Determine any unusual events. Answer: Page A37 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. In a binomial experiment, what does it mean to say that each trial is 
independent of the other trials? 


2. Ina binomial experiment with n trials, what does the random variable measure? 


Graphical Analysis In Exercises 3 and 4, match each given probability with the 
correct graph. The histograms represent binomial distributions. Each distribution 
has the same number of trials n but different probabilities of success p. 


3. p = 0.20, p = 0.50, p = 0.80 
(a) i (b) a (c) “ 


04+ 0.44 


0.3 4- 0.3 54 


0.25 0.2 4 


0.15 0.1 5 


0123 4 Olle 2° "Bt 4 
4. p = 0.25, p = 0.50, p = 0.75 

(a) PC) (b)  ?& (c) PG) 

0.40 - 0.40 0.404 

0.354 0.35 0.354 

0.304 0.30 0.30+ 

0.25 4 0.25 0.254 

0.20 4 0.20 0.204 

0.15 4 0.15 0.154 

0.104 0.10 0.10- 


0.05 5 


0.05 0.05 5 


0123 45 012345 0123 45 


Graphical Analysis In Exercises 5 and 6, match each given value of n with the 
correct graph. Each histogram shown represents part of a binomial distribution. 
Each distribution has the same probability of success p but different numbers of 
trials n. What happens as the value of n increases and p remains the same? 


5.n=4,n=8,n=12 


(a) Pos) (b) Pe (c) Pa) 
0.40 — 0.40 0.40 -- 
0.35 + 0.35 0.35 5 
0.30 — - 0.30 0.30 
0.25 4 - 0.25 0.25 


0.20 - 0.20 0.205 
0.15 5 0.15 0.15 5 
0.10 5 0.10 0.10 5 
0.05 5 0.05 0.05 = 


2 4 6 8 1012 0 2 4 6 8 1012 0 2 4 6 8 10 12 


6 n=5,n=10,n= 15 
(a) Pe) (b) 


x 
a) 
& 


0.40 5 


gssessss 
bPeyN ww 
onmMNouUNoNn 
topo yp 


0.05 + 


0 3 6 9 12 15 
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7. Identify the unusual values of x in each histogram in Exercise 5. 


8. Identify the unusual values of x in each histogram in Exercise 6. 


Identifying and Understanding Binomial Experiments Jn Exercises 
9-12, decide whether the experiment is a binomial experiment. If it is, identify a 
success, specify the values of n, p, and q, and list the possible values of the random 
variable x. If it is not a binomial experiment, explain why. 


9. Cyanosis Cyanosis is the condition of having bluish skin due to insufficient 
oxygen in the blood. About 80% of babies born with cyanosis recover fully. 
A hospital is caring for five babies born with cyanosis. The random variable 
represents the number of babies that recover fully. (Source: The World Book 
Encyclopedia) 


10. Clothing Store Purchases From past records, a clothing store finds that 
26% of the people who enter the store will make a purchase. During a 
one-hour period, 18 people enter the store. The random variable represents 
the number of people who do not make a purchase. 


11. Survey A survey asks 1400 chief financial officers, “Has the economy 
forced you to postpone or reduce the amount of vacation you plan to take 
this year?” Thirty-one percent of those surveyed say they are postponing or 
reducing the amount of vacation. Twenty officers participating in the survey 
are randomly selected. The random variable represents the number of 
officers who are postponing or reducing the amount of vacation. (Source: 
Robert Half Management Resources) 


12. Lottery A state lottery randomly chooses 6 balls numbered from 1 through 
40. You choose six numbers and purchase a lottery ticket. The random 
variable represents the number of matches on your ticket to the numbers 
drawn in the lottery. 


Mean, Variance, and Standard Deviation Jn Exercises 13-16, find the 
mean, variance, and standard deviation of the binomial distribution with the given 
values of n and p. 


13. n = 50, p = 0.4 14. n = 84, p = 0.65 
15. n = 124, p = 0.26 16. n = 316, p = 0.82 


M@ USING AND INTERPRETING CONCEPTS 


Finding Binomial Probabilities Jn Exercises 17-26, find the indicated 
probabilities. If convenient, use technology to find the probabilities. 


17. Answer Guessing You are taking a multiple-choice quiz that consists of 
five questions. Each question has four possible answers, only one of which 
is correct. To complete the quiz, you randomly guess the answer to each 
question. Find the probability of guessing (a) exactly three answers correctly, 
(b) at least three answers correctly, and (c) less than three answers correctly. 


18. Surgery Success A surgical technique is performed on seven patients. You 
are told there is a 70% chance of success. Find the probability that the 
surgery is successful for (a) exactly five patients, (b) at least five patients, and 
(c) less than five patients. 


19. Baseball Fans Fifty-nine percent of men consider themselves fans of 
professional baseball. You randomly select 10 men and ask each if he 
considers himself a fan of professional baseball. Find the probability that 
the number who consider themselves baseball fans is (a) exactly eight, (b) at 
least eight, and (c) less than eight. (Source: Gallup Poll) 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


20. 


21. 


22. 


23. 


24. 


25. 
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Favorite Cookie Ten percent of adults say oatmeal raisin is their favorite 
cookie. You randomly select 12 adults and ask them to name their favorite 
cookie. Find the probability that the number who say oatmeal raisin is their 
favorite cookie is (a) exactly four, (b) at least four, and (c) less than four. 
(Source: WEAREVER) 


Savings Fifty-five percent of U.S. households say they would feel secure if 
they had $50,000 in savings. You randomly select 8 households and ask them 
if they would feel secure if they had $50,000 in savings. Find the probability 
that the number that say they would feel secure is (a) exactly five, (b) more 
than five, and (c) at most five. (Source: HSBC Consumer Survey) 


Honeymoon Financing Seventy percent of married couples paid for their 
honeymoon themselves. You randomly select 20 married couples and ask 
them if they paid for their honeymoon themselves. Find the probability that 
the number of couples who say they paid for their honeymoon themselves is 
(a) exactly one, (b) more than one, and (c) at most one. (Source: Bride’s 
Magazine) 


Financial Advice Forty-three percent of adults say they get their financial 
advice from family members. You randomly select 14 adults and ask them if 
they get their financial advice from family members. Find the probability that 
the number who say they get their financial advice from family members is 
(a) exactly five, (b) at least six, and (c) at most three. (Source: Sun Life 
Unretirement Index) 


Retirement Fourteen percent of workers believe they will need less than 
$250,000 when they retire. You randomly select 10 workers and ask them 
how much money they think they will need for retirement. Find the 
probability that the number of workers who say they will need less than 
$250,000 when they retire is (a) exactly two, (b) more than six, and (c) at 
most five. (Source: Retirement Corporation of America) 


Credit Cards Twenty-eight percent of college students say they use credit 
cards because of the rewards program. You randomly select 10 college 
students and ask them to name the reason they use credit cards. Find the 
probability that the number of college students who say they use credit cards 
because of the rewards program is (a) exactly two, (b) more than two, and (c) 
between two and five, inclusive. (Source: Experience.com) 


Movies on Phone ‘Twenty-five percent of adults say they would watch 
streaming movies on their phone at work. You randomly select 12 adults and 
ask them if they would watch streaming movies on their phone at work. Find 
the probability that the number who say they would watch streaming movies 
on their phone at work is (a) exactly four, (b) more than four, and (c) 
between four and eight, inclusive. (Source: mSpot) 


Constructing Binomial Distributions Jn Exercises 27-30, (a) construct 
a binomial distribution, (b) graph the binomial distribution using a histogram and 
describe its shape, (c) find the mean, variance, and standard deviation of the 
binomial distribution, and (d) interpret the results in the context of the real-life 
situation. What values of the random variable x would you consider unusual? 
Explain your reasoning. 


27. 


Visiting the Dentist Sixty-three percent of adults say they are visiting the 
dentist less because of the economy. You randomly select six adults and ask 
them if they are visiting the dentist less because of the economy. (Source: 
American Optometric Association) 
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28. 


29. 


30. 


31. 


No Trouble Sleeping One in four adults claims to have no trouble sleeping 
at night. You randomly select five adults and ask them if they have any 
trouble sleeping at night. (Source: Marist Institute for Public Opinion) 


Blood Donors Five percent of people in the United States eligible to 
donate blood actually do. You randomly select four eligible blood donors and 
ask them if they donate blood. (Source: MetLife Consumer Education Center) 


Blood Types Thirty-nine percent of people in the United States have type 
O* blood. You randomly select five Americans and ask them if their blood 
type is O*. (Source: American Association of Blood Banks) 


Annoying Flights The graph shows the results of a survey of travelers 
who were asked to name what they found most annoying on a flight. You 
randomly select six people who participated in the survey and ask them to 
name what they find most annoying on a flight. Let x represent the number 
who name crying kids as the most annoying. (Source: USA Today) 


(a) Construct a binomial distribution. 
(b) Find the probability that exactly two people name “crying kids.” 
(c) Find the probability that at least five people name “crying kids.” 


What Do You Find Most Small-business owners want 
Annoying on a Flight? : better customer service skills 


fe 


-_ ~*~ ! a = i Which business skills would you like to develop 
|" | further? Top responses: 


, / Customer service 63% 


[] Bad food/no food 3% © => Wy Marketing/sales 58% 


Grouchy crew 6% Financial 48% 
management 
Bad seatmates | 14% pee making 40% 


Crying kids 


Negotiation 30% 


Cramped quarters 


Note: Multiple responses allowed 
Source: OPEN from American Express Small Business Monitor survey 
of 625 small-business owners with fewer than 100 employees. 


Margin of error +4 percentage points. 
FIGURE FOR EXERCISE 31 FIGURE FOR EXERCISE 32 
32. Small-Business Owners The graph shows the results of a survey of 


33. 


34. 


small-business owners who were asked which business skills they would like 
to develop further. You randomly select five owners who participated in the 
survey and ask them which business skills they want to develop further. Let 
x represent the number who said financial management was the skill they 
wanted to develop further. (Source: American Express) 


(a) Construct a binomial distribution. 


(b) Find the probability that exactly two owners say “financial 
management.” 


(c) Find the probability that fewer than four owners say “financial 
management.” 


Find the mean and standard deviation of the binomial distribution in 
Exercise 31 and interpret the results in the context of the real-life situation. 
What values of x would you consider unusual? Explain your reasoning. 


Find the mean and standard deviation of the binomial distribution in 
Exercise 32 and interpret the results in the context of the real-life situation. 
What values of x would you consider unusual? Explain your reasoning. 
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In Exercises 35 and 36, use the StatCrunch binomial calculator to find 
the indicated probabilities. Then determine if the event is unusual. Explain your 
reasoning. 


35. 


36. 


Pet Owners Sixty-six percent of pet owners say they consider their pet to 
be their best friend. You randomly select 10 pet owners and ask them if they 
consider their pet to be their best friend. Find the probability that the 
number who say their pet is their best friend is (a) exactly nine, (b) at least 
seven, and (c) at most three. (Adapted from Kelton Research) 


Eco-Friendly Vehicles Fifty-three percent of 18- to 30-year-olds say they 
would pay more for an eco-friendly vehicle. You randomly select eight 18- to 
30-year-olds and ask each if they would pay more for an eco-friendly vehicle. 
Find the probability that the number who say they would pay more for an 
eco-friendly vehicle is (a) exactly four, (b) at least five, and (c) less than two. 
(Source: Deloitte LLP and Michigan State University) 


HM EXTENDING CONCEPTS 


Multinomial Experiments Jn Exercises 37 and 38, use the following 
information. 


37. 


38. 


A multinomial experiment is a probability experiment that satisfies the 
following conditions. 


1. The experiment is repeated a fixed number of times 7 where each trial is 
independent of the other trials. 


2. Each trial has k possible mutually exclusive outcomes: FE), Fo, E3,..., Ex. 


3. Each outcome has a fixed probability. So, P(Z,) = p;, P(E2) = po, 
P(E3) = p3,.-.., P(E,x) = px. The sum of the probabilities for all 
outcomes is 


Pit pat pat te F pe = 1, 
4, x, is the number of times E, will occur, x, is the number of times E> will 
occur, x3 is the number of times £3 will occur, and so on. 
5. The discrete random variable x counts the number of times xj, x2, x3,..., 
xX, occurs in n independent trials where 
Xp txXpt xz + XRHn. 
The probability that x will occur is 


n! 


P(x) = “1 po? p33... pe. 
( ) Hi) so! x4! an x,! Pi P2°P3 PK 


Genetics According to a theory in genetics, if tall and colorful plants are 
crossed with short and colorless plants, four types of plants will result: tall 
and colorful, tall and colorless, short and colorful, and short and colorless, with 
corresponding probabilities of , ;x, ji, and i. If 10 plants are selected, 
find the probability that 5 will be tall and colorful, 2 will be tall and colorless, 
2 will be short and colorful, and 1 will be short and colorless. 


Genetics Another proposed theory in genetics gives the corresponding 
probabilities for the four types of plants described in Exercise 37 as 7¢, % i, 
and 4. If 10 plants are selected, find the probability that 5 will be tall and 
colorful, 2 will be tall and colorless, 2 will be short and colorful, and 1 will be 
short and colorless. 
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Binomial Distribution 


The binomial distribution applet allows you to simulate values from a binomial 

APPLET distribution. You can specify the parameters for the binomial distribution 
(nand p) and the number of values to be simulated (N). When you click 
SIMULATE, N values from the specified binomial distribution will be plotted at 
the right. The frequency of each outcome is shown in the plot. 


n:/10 
p05 [yy 2 
N:| 100 


Outcomes 


m= Explore 


Step 1 Specify a value of n. 
Step 2 Specify a value of p. 
Step 3 Specify a value of N. 
Step 4 Click SIMULATE. 


= Draw Conclusions 


APPLET 1. During a presidential election year, 70% of a county’s eligible voters actually 
vote. Simulate selecting n = 10 eligible voters N = 10 times (for 10 commu- 
nities in the county). Use the results to estimate the probability that the number 
who voted in this election is (a) exactly 5, (b) at least 8, and (c) at most 7. 


2. During a non-presidential election year, 20% of the eligible voters in the same 
county as in Exercise 1 actually vote. Simulate selecting n = 10 eligible voters 
N = 10 times (for 10 communities in the county). Use the results to estimate 
the probability that the number who voted in this election is (a) exactly 4, (b) 
at least 5, and (c) less than 4. 


3. Suppose in Exercise 1 you select n = 10 eligible voters N = 100 times. 
Estimate the probability that the number who voted in this election is exactly 
5. Compare this result with the result in Exercise 1 part (a). Which of these is 
closer to the probability found using the binomial probability formula? 


Binomial Distribution of Airplane 
Accidents 


The Air Transport Association of America (ATA) is a support Fatal Commercial Airplane 
organization for the principal U.S. airlines. Some of the ATA’s Accidents per Year (1979-2008) 
activities include promoting the air transport industry and f 
conducting industry-wide studies. 

The ATA also keeps statistics about commercial airline 
flights, including those that involve accidents. From 1979 
through 2008 for aircraft with 10 or more seats, there were 
76 fatal commercial airplane accidents involving U.S. airlines. 
The distribution of these accidents is shown in the histogram 
at the right. 


Frequency 


0 1 2 3 4 5 6 
Number of accidents 


Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 | 1993 
Accidents 4 0 4 4 4 1 + 2 4 3 > 4 3 3 1 


Year 1994 1995 1996 1997 1998 1999 2000 2001 2002 | 2003 2004 2005 2006 2007 2008 
Accidents 4 1 3 3 1 2 2 6 0 2 1 3 2 0 0 


M@ EXERCISES 


1. In 2006, there were about 11 million 
commercial flights in the United States. 
If one is selected at random, what is the 
probability that it involved a fatal accident? 


2. Suppose that the probability of a fatal 
accident in a given year is 0.0000004. 
A binomial probability distribution for 
n = 11,000,000 and p = 0.0000004 with 
x = 0 to 12 is shown. 


What is the probability that there will be 
(a) 4 fatal accidents in a year? (b) 10 fatal 
accidents? (c) between 1 and 5, inclusive? 


. Construct a binomial distribution for 


n= 11,000,000 and p = 0.0000008 with 
x = 0 to 12. Compare your results with the 
distribution in Exercise 2. 


. Is a binomial distribution a good model 


for determining the probabilities of various 


P(x) numbers of fatal accidents during a year? 
ae t Explain your reasoning and include a 
discussion of the four criteria for a binomial 

0.20-+ experiment. 
=| 0.15- 5. According to analysis by USA TODAY, air 
Fs flight is so safe that a person “would have 
= 0.104 to fly every day for more than 64,000 years 


0.05 = 


123 45 67 8 9 1011 12 


Number of accidents 


before dying in an accident.” How can such 
a statement be justified? 
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WHAT YOU SHOULD LEARN 


>» How to find probabilities 
using the geometric 
distribution 


> How to find probabilities using 
the Poisson distribution 


STUDY TIP 


Here are instructions for finding 
a geometric probability on a 
TI-83/84 Plus. 


DISTR 


D: geometpdf( 


Enter the values of p 

and x separated by 

commas. di" 
ENTER 


Jeometed¢?¢. 74,43 


.A5Ra24 
Jeqometed?¢.r4.43 


» 813686274 


Using a TI-83/84 Plus, you can find 
the probabilities used in Example 1 
automatically. 


More Discrete Probability Distributions 


The Geometric Distribution >» The Poisson Distribution » Summary of 
Discrete Probability Distributions 


> THE GEOMETRIC DISTRIBUTION 


In this section, you will study two more discrete probability distributions—the 
geometric distribution and the Poisson distribution. 

Many actions in life are repeated until a success occurs. For instance, a CPA 
candidate might take the CPA exam several times before receiving a passing 
score, or you might have to send an e-mail several times before it is successfully 
sent. Situations such as these can be represented by a geometric distribution. 


DEFINITION 


A geometric distribution is a discrete probability distribution of a random 
variable x that satisfies the following conditions. 


1. A trial is repeated until a success occurs. 
2. The repeated trials are independent of each other. 
3. The probability of success p is constant for each trial. 


4. The random variable x represents the number of the trial in which the first 
success occurs. 


The probability that the first success will occur on trial number x is 


P(x) = pq*~', where g = 1 — p. 


In other words, when the first success occurs on the third trial, the outcome 
is FFS, and the probability is P(3) = q-q:p, or P(3) = p-q’. 


EXAMPLE 1 


» Finding Probabilities Using the Geometric Distribution 


Basketball player LeBron James makes a free throw shot about 74% of the 
time. Find the probability that the first free throw shot LeBron makes occurs 
on the third or fourth attempt. (Source: ESPN) 


> Solution To find the probability that LeBron makes his first free throw 
shot on the third or fourth attempt, first find the probability that the first shot 
he makes will occur on the third attempt and the probability that the first shot 
he makes will occur on the fourth attempt. Then, find the sum of the resulting 
probabilities. Using p = 0.74, q = 0.26, and x = 3, you have 


P(3) = 0.74: (0.26)? = 0.050024. 
Using p = 0.74, gq = 0.26, and x = 4, you have 
P(4) = 0.74- (0.26)? = 0.013006. 


So, the probability that LeBron makes his first free throw shot on the third or 
fourth attempt is 


P(shot made on third or fourth attempt) = P(3) + P(4) 
= 0.050024 + 0.013006 ~ 0.063. 
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> Try It Yourself 1 


Find the probability that LeBron makes his first free throw shot before his 
third attempt. 


a. Use the geometric distribution to find P(1) and P(2). 
b. Find the sum of P(1) and P(2). 
c. Write the result as a sentence. Answer: Page A37 


Even though theoretically a success may never occur, the geometric distribution 
is a discrete probability distribution because the values of x can be listed—1, 2, 
3,.... Notice that as x becomes larger, P(x) gets closer to zero. For instance, 


P(15) = 0.74(0.26)'* ~ 0.0000000048. 


> THE POISSON DISTRIBUTION 


In a binomial experiment, you are interested in finding the probability of a 
specific number of successes in a given number of trials. Suppose instead that you 
want to know the probability that a specific number of occurrences takes place 
within a given unit of time or space. For instance, to determine the probability 
that an employee will take 15 sick days within a year, you can use the Poisson 
distribution. 


DEFINITION 
The Poisson distribution is a discrete probability distribution of a random 
STUDY TIP variable x that satisfies the following conditions. 
Here are instructions for finding 1. The experiment consists of counting the number of times x an event occurs 
a Poisson probability on a in a given interval. The interval can be an interval of time, area, 
TI-83/84 Plus. or volume. 
2nd| DISTR 2. The probability of the event occurring is the same for each interval. 


B: poissonpdf( 3. The number of occurrences in one interval is independent of the number of 


Enter the values of u occurrences in other intervals. 
and x separated by 


commas. 
ENTER 


The probability of exactly x occurrences in an interval is 


ee 
Dax! 


P(x) 


where e is an irrational number approximately equal to 2.71828 and w is the 
mean number of occurrences per interval unit. 


EXAMPLE 2 G@® Report 19 


» Using the Poisson Distribution 


The mean number of accidents per month at a certain intersection is three. 
What is the probability that in any given month four accidents will occur at this 


intersection? 
Using a TI-83/84 Plus, you can . 
find the probability in Example 2 > Solution 
automatically. Using x = 4and yw = 3, the probability that 4 accidents will occur in any given 


month at the intersection is 


34(2.71828) 3 
P(4) & 4a = 0.168. 
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> Try It Yourself 2 


What is the probability that more than four accidents will occur in any given 
month at the intersection? 


a. Use the Poisson distribution to find P(0), P(1), P(2), P(3), and P(4). 

b. Find the sum of P(0), P(1), P(2), P(3), and P(4). 

c. Subtract the sum from 1. 

d. Write the result as a sentence. Answer: Page A37 


In Example 2, you used a formula to determine a Poisson probability. You can 
also use a table to find Poisson probabilities. Table 3 in Appendix B lists the 
Poisson probabilities for selected values of x and w. You can use technology tools, 
such as MINITAB, Excel, and the TI-83/84 Plus, to find Poisson probabilities 
as well. 


EXAMPLE 3 


The first successful suspension 


bridge built in the United States, » Finding Poisson Probabilities Using a Table 
the Tacoma Narrows Bridge, 
spans the Tacoma Narrows in 
Washington State. The average 
occupancy of vehicles that travel 


A population count shows that the average number of rabbits per acre living 
in a field is 3.6. Use a table to find the probability that seven rabbits are found 
on any given acre of the field. 


across the bridge is 1.6. The 


following probability distribution 4 SoluMey ; ae 
represents the vehicle occupancy A portion of Table 3 in Appendix B is shown here. Using the distribution for 
on the bridge during a five-day pw = 3.6 and x = 7, you can find the Poisson probability as shown by the 
period. (Adapted from Washington State highlighted areas in the table. 
Department of Transportation) 
p 
P(x) 
i x 3.2 3.3 3.4 
0.80 0 .0450 .0408 .0369 .0334 .0302 .0247 
> 1 .1397 .1304 alll all 11235) .1057 0915 
eal 2 2165 .2087 2008 1929 .1850 1692 
S o4o+ 3 P23, .2226 .2209 .2186 2158 .2087 
E 4 .1734 .1781 -1823 .1858 .1888 .1931 
0.205 3) .1075 .1140 .1203 1429 
.0555 .0608 .0662 .0881 
1a Ss AS oe 0466 


0095 0129 0148 0169 0191 ~—.0215 


Number of people 8 ou 1 
tae 9 0033 0040 0047 .0056 .0066 0076 0089 
10 .0010 .0013 .0016 .0019 .0023 .0028 .0033 
What is the probability that a 
randomly selected vehicle has Interpretation So, the probability that seven rabbits are found on any given 
two occupants or fewer? acre is 0.0425. Because 0.0425 is less than 0.05, this can be considered an 


unusual event. 


> Try It Yourself 3 

Two thousand brown trout are introduced into a small lake. The lake has a 
volume of 20,000 cubic meters. Use a table to find the probability that three 
brown trout are found in any given cubic meter of the lake. 


a. Find the average number of brown trout per cubic meter. 

b. Identify w and x. 

c. Use Table 3 in Appendix B to find the Poisson probability. 

d. Interpret the results. 

e. Determine if the event is unusual. Explain. Answer: Page A37 
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>» SUMMARY OF DISCRETE PROBABILITY 


DISTRIBUTIONS 


The following table summarizes the discrete probability distributions discussed 


in this chapter. 


A binomial experiment satisfies the following 

conditions. 

1. The experiment is repeated for a fixed 
number n of independent trials. 


2. There are only two possible outcomes for 
each trial. Each outcome can be classified 
as a success or as a failure. 


3. The probability of a success must remain 
constant for each trial. 

4. The random variable x counts the number of 
successful trials. 

The parameters of a binomial distribution are 

n and p. 


A geometric distribution is a discrete probability 
distribution of a random variable x that satisfies 
the following conditions. 


1. A trial is repeated until a success occurs. 


2. The repeated trials are independent of each 
other. 

3. The probability of success p is constant 
for each trial. 

4. The random variable x represents the number 
of the trial in which the first success occurs. 

The parameter of a geometric distribution 

is p. 


The Poisson distribution is a discrete probability 
distribution of a random variable x that satisfies 
the following conditions. 


1. The experiment consists of counting 
the number of times x an event occurs over a 
specified interval of time, area, or volume. 


2. The probability of the event occurring is the 
same for each interval. 

3. The number of occurrences in one interval is 
independent of the number of occurrences in 
other intervals. 


The parameter of a Poisson distribution is pw. 


n = the number of times a trial repeats 
x = the number of successes in 7 trials 


Pp = probability of success in a single 
trial 


q = probability of failure in a single trial 
Gio ae. 

The probability of exactly x successes 

in 7 trials is 


Ps) =4C,p a" 


x = the number of the trial in which 
the first success occurs 


Pp = probability of success in a single 
trial 

q = probability of failure in a single trial 

ee 

The probability that the first success 

occurs on trial number x is 


P(x) = pq* |. 


x = the number of occurrences in the 
given interval 


pf. = the mean number of occurrences 
in a given time or space unit 


The probability of exactly x occurrences 
in an interval is 


we 


J\(32)) a 
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DISCRETE PROBABILITY DISTRIBUTIONS 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


In Exercises 1-4, assume the geometric distribution applies. Use the given 
probability of success p to find the indicated probability. 


1. Find P(3) when p = 0.65. 2. Find P(1) when p = 0.45. 
3. Find P(5) when p = 0.09. 4. Find P(8) when p = 0.28. 


In Exercises 5—8, assume the Poisson distribution applies. Use the given mean yu to 
find the indicated probability. 


5. Find P(4) when p = S. 6. Find P(3) when p = 6. 
7. Find P(2) when pw = 1.5. 8. Find P(5) when wp = 9.8. 


9. In your own words, describe the difference between the value of x in a 
binomial distribution and in a geometric distribution. 


10. In your own words, describe the difference between the value of x in a 
binomial distribution and in a Poisson distribution. 


Deciding on a Distribution Jn Exercises 11-14, decide which probability 
distribution—binomial, geometric, or Poisson—applies to the question. You do not 
need to answer the question. Instead, justify your choice. 


11. Pilot’s Test Given: The probability that a student passes the written test 
for a private pilot’s license is 0.75. Question: What is the probability 
that a student will fail on the first attempt and pass on the second attempt? 


12. Precipitation Given: In Tampa, Florida, the mean number of days in July 
with 0.01 inch or more precipitation is 16. Question: What is the probability 
that Tampa has 20 days with 0.01 inch or more precipitation next July? 
(Source: National Climatic Data Center) 


13. Carry-On Luggage Given: Fifty-four percent of U.S. adults think Congress 
should place size limits on carry-on bags. In a survey of 110 randomly chosen 
adults, people are asked, “Do you think Congress should place size limits on 
carry-on bags?” Question: What is the probability that exactly 60 of the 
people answer yes? (Source: TripAdvisor) 


14. Breaking Up Given: Twenty-nine percent of Americans ages 16 to 21 years 
old say that they would break up with their boyfriend/girlfriend for $10,000. 
You select at random twenty 16- to 21-year-olds. Question: What is the 
probability that the first person who says he or she would break up with their 
boyfriend/girlfriend for $10,000 is the fifth person selected? (Source: Bank of 
America Student Banking & Seventeen) 


M@ USING AND INTERPRETING CONCEPTS 


Using a Distribution to Find Probabilities In Exercises 15-22, find the 
indicated probabilities using the geometric distribution or the Poisson distribution. 
Then determine if the events are unusual. If convenient, use a Poisson probability 
table or technology to find the probabilities. 


15. Telephone Sales Assume the probability that you will make a sale on any 
given telephone call is 0.19. Find the probability that you (a) make your first 
sale on the fifth call, (b) make your first sale on the first, second, or third call, 
and (c) do not make a sale on the first three calls. 
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17. 


18. 


19. 
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21. 
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Bankruptcies The mean number of bankruptcies filed per minute in the 
United States in a recent year was about two. Find the probability that 
(a) exactly five businesses will file bankruptcy in any given minute, (b) at 
least five businesses will file bankruptcy in any given minute, and (c) more 
than five businesses will file bankruptcy in any given minute. (Source: 
Administrative Office of the U.S. Courts) 


Typographical Errors A newspaper finds that the mean number of 
typographical errors per page is four. Find the probability that (a) exactly three 
typographical errors are found on a page, (b) at most three typographical 
errors are found on a page, and (c) more than three typographical errors are 
found on a page. 


Pass Completions Football player Peyton Manning completes a pass 64.8% 
of the time. Find the probability that (a) the first pass Peyton completes is the 
second pass, (b) the first pass Peyton completes is the first or second pass, and 
(c) Peyton does not complete his first two passes. (Source: National Football 
League) 


Major Hurricanes A major hurricane is a hurricane with wind speeds of 
111 miles per hour or greater. During the 20th century, the mean number of 
major hurricanes to strike the U.S. mainland per year was about 0.6. Find the 
probability that in a given year (a) exactly one major hurricane strikes the 
US. mainland, (b) at most one major hurricane strikes the U.S. mainland, 
and (c) more than one major hurricane strikes the U.S. mainland. (Source: 
National Hurricane Center) 


Glass Manufacturer A glass manufacturer finds that 1 in every 500 glass 
items produced is warped. Find the probability that (a) the first warped glass 
item is the tenth item produced, (b) the first warped glass item is the first, 
second, or third item produced, and (c) none of the first 10 glass items 
produced are defective. 


Winning a Prize A cereal maker places a game piece in each of its cereal 
boxes. The probability of winning a prize in the game is 1 in 4. Find the 
probability that you (a) win your first prize with your fourth purchase, 
(b) win your first prize with your first, second, or third purchase, and (c) do 
not win a prize with your first four purchases. 


Precipitation The mean number of days with 0.01 inch or more 
precipitation per month in Baltimore, Maryland, is about 9.5. Find the 
probability that in a given month, (a) there are exactly 10 days with 0.01 inch 
or more precipitation, (b) there are at most 10 days with 0.01 inch or more 
precipitation, and (c) there are more than 10 days with 0.01 inch or more 
precipitation. (Source: National Climatic Data Center) 


In Exercises 23 and 24, use the StatCrunch Poisson calculator to find the indi- 
cated probabilities. Then determine if the events are unusual. Explain your reasoning. 


23. 


24. 


Oil Tankers The mean number of oil tankers at a port city is 8 per day. 
The port has facilities to handle up to 12 oil tankers in a day. Find the 
probability that on a given day, (a) eight oil tankers will arrive, (b) at most 
three oil tankers will arrive, and (c) too many oil tankers will arrive. 


Kidney Transplants The mean number of kidney transplants performed per 
day in the United States in a recent year was about 45. Find the probability 
that on a given day, (a) exactly 50 kidney transplants will be performed, 
(b) at least 65 kidney transplants will be performed, and (c) no more than 
40 kidney transplants will be performed. (Source: U.S. Department of Health 
and Human Services) 
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DISCRETE PROBABILITY DISTRIBUTIONS 


M@ EXTENDING CONCEPTS 


25. 


26. 


Comparing Binomial and Poisson Distributions An automobile manufac- 
turer finds that 1 in every 2500 automobiles produced has a particular 
manufacturing defect. (a) Use a binomial distribution to find the probability 
of finding 4 cars with the defect in a random sample of 6000 cars. (b) The 
Poisson distribution can be used to approximate the binomial distribution 
for large values of n and small values of p. Repeat (a) using a Poisson 
distribution and compare the results. 


Hypergeometric Distribution Binomial experiments require that any 
sampling be done with replacement because each trial must be independent 
of the others. The hypergeometric distribution also has two outcomes: 
success and failure. However, the sampling is done without replacement. 
Given a population of N items having k successes and N — k failures, the 
probability of selecting a sample of size n that has x successes and n — x 
failures is given by 


(cCx)( Nn-kCn—x) 
() 7 nC , 


In a shipment of 15 microchips, 2 are defective and 13 are not defective. 
A sample of three microchips is chosen at random. Find the probability 
that (a) all three microchips are not defective, (b) one microchip is defective 
and two are not defective, and (c) two microchips are defective and one is not 
defective. 


Geometric Distribution: Mean and Variance = Jn Exercises 27 and 28, use 
the fact that the mean of a geometric distribution is 4 = 1/p and the variance is 


o° = q/p’. 


27. 


28. 


Daily Lottery A daily number lottery chooses three balls numbered 0 to 9. 
The probability of winning the lottery is 1/1000. Let x be the number of times 
you play the lottery before winning the first time. (a) Find the mean, 
variance, and standard deviation. Interpret the results. (b) How many times 
would you expect to have to play the lottery before winning? Assume that it 
costs $1 to play and winners are paid $500. Would you expect to make or lose 
money playing this lottery? Explain. 


Paycheck Errors A company assumes that 0.5% of the paychecks for a year 
were calculated incorrectly. The company has 200 employees and examines 
the payroll records from one month. (a) Find the mean, variance, and 
standard deviation. Interpret the results. (b) How many employee payroll 
records would you expect to examine before finding one with an error? 


Poisson Distribution: Variance Jn Exercises 29 and 30, use the fact that the 
variance of a Poisson distribution is 0” = w. 


29. 


30. 


Golf Ina recent year, the mean number of strokes per hole for golfer Phil 
Mickelson was about 3.9. (a) Find the variance and standard deviation. 
Interpret the results. (b) How likely is Phil to play an 18-hole round and 
have more than 72 strokes? (Source: PGATour.com) 


Snowfall The mean snowfall in January in Mount Shasta, California is 
29.9 inches. (a) Find the variance and standard deviation. Interpret the 
results. (b) Find the probability that the snowfall in January in Mount Shasta, 
California will exceed 3 feet. (Source: National Climatic Data Center) 
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Uses 


There are countless occurrences of binomial probability distributions 
in business, science, engineering, and many other fields. 

For instance, suppose you work for a marketing agency and are in charge of 
creating a television ad for Brand A toothpaste. The toothpaste manufacturer 
claims that 40% of toothpaste buyers prefer its brand. To check whether the 
manufacturer’s claim is reasonable, your agency conducts a survey. Of 100 
toothpaste buyers selected at random, you find that only 35 (or 35%) prefer 
Brand A. Could the manufacturer’s claim still be true? What if your random 
sample of 100 found only 25 people (or 25%) who express a preference for 
Brand A? Would you still be justified in running the advertisement? 

Knowing the characteristics of binomial probability distributions will help 
you answer this type of question. By the time you have completed this course, 
you will be able make educated decisions about the reasonableness of the 
manufacturer’s claim. 


Ethics 


Suppose the toothpaste manufacturer also claims that four out of five dentists 
recommend Brand A toothpaste. Your agency wants to mention this fact in the 
television ad, but when determining how the sample of dentists was formed, 
you find that the dentists were paid to recommend the toothpaste. Including 
this statement when running the advertisement would be unethical. 


Abuses 


Interpreting the “Most Likely” Outcome A common misuse of binomial 
probability distributions is to think that the “most likely” outcome is the 
outcome that will occur most of the time. For instance, suppose you randomly 
choose a committee of four from a large population that is 50% women and 
50% men. The most likely composition of the committee will be two men and 
two women. Although this is the most likely outcome, the probability that it 
will occur is only 0.375. There is a 0.5 chance that the committee will contain 
one man and three women or three men and one woman. So, if either of these 
outcomes occurs, you should not assume that the selection was unusual or 
biased. 


Mi EXERCISES 


In Exercises 1—4, suppose that the manufacturer's claim is true—40% of tooth- 
paste buyers prefer Brand A toothpaste. Use the graph and technology to answer 
the questions. Explain your reasoning. 


1. Interpreting the “Most Likely” Outcome Inarandom sample 
of 100, what is the most likely outcome? How likely is it? 


i 5 2. Interpreting the “Most Likely” Outcome Inarandom sample 
of 100, what is the probability that between 35 and 45 people, 
inclusive, prefer Brand A? 


Probability 


] 3. Suppose in a random sample of 100, you found 36 who prefer 
Brand A. Would the manufacturer’s claim be believable? 


4Aaoooaon T m5 x iy 
25 27 29 31 33 35 37 39 41 43 45 47 49 51 53.55 4. Suppose in a random sample of 100, you found 25 who prefer 


Number who prefer Brand A Brand A. Would the manufacturer’s claim be believable? 
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4") CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 4.1 
= How to distinguish between discrete random variables and continuous 1 1-6 


random variables 


= How to determine if a distribution is a probability distribution 3-4 7-10 
= How to construct a discrete probability distribution and its graph and find the 2, 5,6 11-14 
mean, variance, and standard deviation of a discrete probability distribution 
m= YxP(x) Mean of a discrete random variable 
ao = d(x — p)P(x) Variance of a discrete random variable 


Standard deviation of a discrete 
o=Vor= Vz (x— ye (x) random variable 


= How to find the expected value of a discrete probability distribution 7 15-16 


Section 4.2 


= How to determine if a probability experiment is a binomial experiment 1 17-20 


= How to find binomial probabilities using the binomial probability formula, 2, 4-6 21-24 
a binomial probability table, and technology 


Pix) =,C pg" =7_ 37a” Binomial probability formula 


= How to construct a binomial distribution and its graph and find the mean, 3,7,8 25-28 
variance, and standard deviation of a binomial probability distribution 
wm =np Mean of a binomial distribution 
c=zn Pq Variance of a binomial distribution 


o = Vnpq Standard deviation of a binomial distribution 


Section 4.3 


= How to find probabilities using the geometric distribution 1 29, 30 


P(x) = pq 


cal Probability that the first success will occur on trial number x 


= How to find probabilities using the Poisson distribution 2,3 31-33 


pe 


P(x) Probability of exactly x occurrences in an interval 


x! 
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DD “REVIEW EXERCISES 


M@ SECTION 4.1 


In Exercises I and 2, decide whether the graph represents a discrete random 
variable or a continuous random variable. Explain your reasoning. 


1. The number of hours spent 2. The number of fish caught during 
sleeping each day a fishing tournament 
 — @e eevee eeee 
0 4 8 12 16 20 24 0 2 4 6 8 10 


In Exercises 3—6, decide whether the random variable x is discrete or continuous. 


3. Let x represent the number of pumps in use at a gas station. 
4. Let x represent the weight of a truck at a weigh station. 


5. Let x represent the amount of carbon dioxide emitted from a car’s tailpipe 
each day. 


6. Let x represent the number of people that activate a metal detector at an 
airport each hour. 


In Exercises 7-10, decide whether the distribution is a probability distribution. If it 
is not, identify the property that is not satisfied. 


7. The daily limit for catching bass at a lake is four. The random variable x 
represents the number of fish caught in a day. 


Cat 0 1 2 3 4 
Pex) 0.36 0.23 0.08 014 | 0.29 


8. The random variable x represents the number of tickets a police officer 
writes out each shift. 


Ron 0 1 2 3 4 5 
Pex) 0.09 0.23 0.29 0.16 0.21 0.02 


. A greeting card shop keeps records of customers’ buying habits. The random 
variable x represents the number of cards sold to an individual customer in 
a shopping visit. 


= 1 2 3 4 5 é 7 
| Pe) | 0.68 | 0.14 0.08 0.05 0.02 0.02 0.01 


10. The random variable x represents the number of classes in which a student is 
enrolled in a given semester at a university. 


\o 


2 3 4 5 6 7 8 


L 2 1 
75 10 25 20 5 25 | 120 
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In Exercises 11-14, 


(a) use the frequency distribution table to construct a probability distribution, 


(b) graph the probability distribution using a histogram and describe its shape, 


(c) find the mean, variance, and standard deviation of the probability distribution, and 


(d) interpret the results in the context of the real-life situation. 


11. The number of pages ina section 12. The number of hits per game played 


13. 


from 10 statistics texts by a baseball player during a recent 
season 
Ete ans | ees) 
2 3 0 29 
3 12 1 62 
4 72 2 33 
5) 115 3 12 
6 169 4 3 
7 120 5 1 
8 83 
9 48 
10 22 
11 6 
The distribution of the number of 14. A television station sells advertising 
cellular phones per household in in 15-, 30-, 60-, 90-, and 120-second 


a small town is given. blocks. The distribution of sales for 
one 24-hour day is given. 


0 

1 35 15 76 
2 68 30 445 
3 73 60 30 
4 42 ‘a = 
5 19 120 12 
6 8 


In Exercises 15 and 16, find the expected value of the random variable. 


15. 


16. 


A person has shares of eight different stocks. The random variable x 
represents the number of stocks showing a loss on a selected day. 


1 2 3 4 5 6 7 8 


j=) 
i=) 
N 
So 
im 
ay 
S 
im 
ie“) 
oO 
ies) 
ie) 
j=) 
- 
nn 


0.09 0.05 0.05 | 0.03 


¢ (=) 


A local pub has a chicken wing special on Tuesdays. The pub owners purchase 
wings in cases of 300. The random variable x represents the number of cases 
used during the special. 


1 2 3 4 


L l 1 pg 
9 3 2 18 
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M@ SECTION 4.2 


In Exercises 17 and 18, use the following information. 


A probability experiment has n independent trials. Each trial has three possible 
outcomes: A, B, and C. For each trial, P(A) = 0.30, P(B) = 0.50, and 
P(C) = 0.20. There are 20 trials. 


17. Can a binomial experiment be used to find the probability of 6 outcomes of 
A, 10 outcomes of B, and 4 outcomes of C? Explain your reasoning. 


18. Can a binomial experiment be used to find the probability of 4 outcomes 
of C and 16 outcomes that are not C? Explain your reasoning. What is the 
probability of success for each trial? 


In Exercises 19 and 20, decide whether the experiment is a binomial experiment. If 
it is not, identify the property that is not satisfied. [f it is, list the values of n, p, and 
q and the values that x can assume. 


19. Bags of plain M&M’s contain 24% blue candies. One candy is selected from 
each of 12 bags. The random variable represents the number of blue candies 
selected. (Source: Mars, Incorporated) 


20. A fair coin is tossed repeatedly until 15 heads are obtained. The random 
variable x counts the number of tosses. 


In Exercises 21-24, find the indicated probabilities. 


21. One in four adults is currently on a diet. You randomly select eight adults 
and ask them if they are currently on a diet. Find the probability that the 
number who say they are currently on a diet is (a) exactly three, (b) at least 
three, and (c) more than three. (Source: Wirthlin Worldwide) 


22. One in four people in the United States owns individual stocks. You 
randomly select 12 people and ask them if they own individual stocks. Find 
the probability that the number who say they own individual stocks is 
(a) exactly two, (b) at least two, and (c) more than two. (Source: Pew Research 
Center) 


23. Forty-three percent of businesses in the United States require a doctor’s 
note when an employee takes sick time. You randomly select nine businesses 
and ask each if it requires a doctor’s note when an employee takes sick time. 
Find the probability that the number who say they require a doctor’s note is 
(a) exactly five, (b) at least five, and (c) more than five. (Source: Harvard 
School of Public Health) 


24. In a typical day, 31% of people in the United States with Internet access 
go online to get news. You randomly select five people in the United States 
with Internet access and ask them if they go online to get news. Find the 
probability that the number who say they go online to get news is (a) exactly 
two, (b) at least two, and (c) more than two. (Source: Pew Research Center) 


In Exercises 25-28, 
(a) construct a binomial distribution, 
(b) graph the binomial distribution using a histogram and describe its shape, 


(c) find the mean, variance, and standard deviation of the binomial distribution 
and interpret the results in the context of the real-life situation, and 


(d) determine the values of the random variable x that you would consider unusual. 


25. Thirty-four percent of women in the United States say their spouses never 
help with household chores. You randomly select five U.S. women and ask if 
their spouses help with household chores. (Source: Boston Consulting Group) 
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26. 


27. 


28. 


Sixty-eight percent of families say that their children have an influence on 
their vacation destinations. You randomly select six families and ask if their 
children have an influence on their vacation destinations. (Source: YPB&R) 


In a recent year, forty percent of trucks sold by a company had diesel 
engines. You randomly select four trucks sold by the company and check if 
they have diesel engines. 


Sixty-three percent of U.S. mothers with school-age children choose fast 
food as a dining option for their families one to three times a week. You 
randomly select five U.S. mothers with school-age children and ask if they 
choose fast food as a dining option for their families one to three times a 
week. (Adapted from Market Day) 


M@ SECTION 4.3 


In Exercises 29-32, find the indicated probabilities using the geometric distribution 
or the Poisson distribution. Then determine if the events are unusual. If convenient, 
use a Poisson probability table or technology to find the probabilities. 


29. 


30. 


31. 


32. 


33. 


Twenty-two percent of former smokers say they tried to quit four or more 
times before they were habit-free. You randomly select 10 former smokers. 
Find the probability that the first person who tried to quit four or more times 
is (a) the third person selected, (b) the fourth or fifth person selected, and 
(c) not one of the first seven people selected. (Source: Porter Novelli Health 
Styles) 


In a recent season, hockey player Sidney Crosby scored 33 goals in 77 games 
he played. Assume that his goal production stayed at that level the following 
season. What is the probability that he would get his first goal 


(a) in the first game of the season? 

(b) in the second game of the season? 

(c) in the first or second game of the season? 

(d) within the first three games of the season? (Source: ESPN) 

During a 69-year period, tornadoes killed 6755 people in the United States. 


Assume this rate holds true today and is constant throughout the year. Find 
the probability that tomorrow 


(a) no one in the U.S. is killed by a tornado, 

(b) one person in the U.S. is killed by a tornado, 

(c) at most two people in the U.S. are killed by a tornado, and 

(d) more than one person in the U.S. is killed by a tornado. (Source: National 
Weather Service) 

It is estimated that sharks kill 10 people each year worldwide. Find the 

probability that at least 3 people are killed by sharks this year 

(a) assuming that this rate is true, 

(b) if the rate is actually 5 people a year, and 

(c) if the rate is actually 15 people a year. (Source: International Shark Attack 
File) 


In Exercise 32, describe what happens to the probability of at least three 
people being killed by sharks this year as the rate increases and decreases. 
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PD cuapter Quiz 


TABLE FOR EXERCISE 2 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. Decide if the random variable x is discrete or continuous. Explain your 
reasoning. 


(a) Let x represent the number of lightning strikes that occur in Wyoming 
during the month of June. 


(b) Let x represent the amount of fuel (in gallons) used by the Space Shuttle 
during takeoff. 


2. The table lists the number of U.S. mainland hurricane strikes (from 1851 to 
2008) for various intensities according to the Saffir-Simpson Hurricane Scale. 
(Source: National Oceanic and Atmospheric Administration) 


(a) Construct a probability distribution of the data. 


(b) Graph the discrete probability distribution using a probability histogram. 
Then describe its shape. 


(c) Find the mean, variance, and standard deviation of the probability 
distribution and interpret the results. 


(d) Find the probability that a hurricane selected at random for further study 
has an intensity of at least four. 
3. The success rate of corneal transplant surgery is 85%. The surgery is 
performed on six patients. (Adapted from St. Luke’s Cataract & Laser Institute) 
(a) Construct a binomial distribution. 


(b) Graph the binomial distribution using a probability histogram. Then 
describe its shape. 


(c) Find the mean, variance, and standard deviation of the probability 
distribution and interpret the results. 


(d) Find the probability that the surgery is successful for exactly three patients. 
Is this an unusual event? Explain. 


(e) Find the probability that the surgery is successful for fewer than four 
patients. Is this an unusual event? Explain. 


4. A newspaper finds that the mean number of typographical errors per page is 
five. Find the probability that 


(a) exactly five typographical errors will be found on a page, 

(b) fewer than five typographical errors will be found on a page, and 

(c) no typographical errors will be found on a page. 
In Exercises 5 and 6, use the following information. Basketball player Dwight 
Howard makes a free throw shot about 60.2% of the time. (Source: ESPN) 


5. Find the probability that the first free throw shot Dwight makes is the fourth 
shot. Is this an unusual event? Explain. 


6. Find the probability that the first free throw shot Dwight makes is the second 
or third shot. Is this an unusual event? Explain. 
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The Centers for Disease Control and Prevention (CDC) is required by 
law to publish a report on assisted reproductive technologies (ART). 
ART includes all fertility treatments in which both the egg and the 
sperm are used. These procedures generally involve removing eggs 
from a woman’s ovaries, combining them with sperm in the laboratory, 
and returning them to the woman’s body or giving them to another 
woman. 

You are helping to prepare the CDC report and select at random 
10 ART cycles for a special review. None of the cycles resulted in a 
clinical pregnancy. Your manager feels it is impossible to select at 
random 10 ART cycles that did not result in a clinical pregnancy. Use 
the information provided at the right and your knowledge of statistics 
to determine if your manager is correct. 


1. How Would You Do It? 


(a) How would you determine if your manager’s view is correct, 
that it is impossible to select at random 10 ART cycles that did 
not result in a clinical pregnancy? 


(b) What probability distribution do you think best describes the 
situation? Do you think the distribution of the number of 
clinical pregnancies is discrete or continuous? Why? 


2. Answering the Question 


Write an explanation that answers the question, “Is it possible to 
select at random 10 ART cycles that did not result in a clinical 
pregnancy?” Include in your explanation the appropriate 
probability distribution and your calculation of the probability of 
no clinical pregnancies in 10 ART cycles. 


3. Suspicious Samples? 
Which of the following samples would you consider suspicious if 
someone told you that the sample was selected at random? Would 
you believe that the samples were selected at random? Why or 
why not? 
(a) Selecting at random 10 ART cycles among women of age 40, 
eight of which resulted in clinical pregnancies. 


(b) Selecting at random 10 ART cycles among women of age 41, 
none of which resulted in clinical pregnancies. 


Percentage 


Pgs Real Statistics — Real Decisions 


Results of ART Cycles 
Using Fresh Nondonor 


Eggs or Embryos 
Ectopic pregnancy 
Clinical 0.7% 
pregnancy 
34.9% 
No pregnancy 
64.3% 


(Source: Centers for Disease Control and Prevention) 


Pregnancy and Live Birth Rates for 
ART Cycles Among Women 
of Age 40 and Older 


Pregnancy rate 
D Live birth rate 


DS 
N 
i. au 7s 
Al 45 
and older 
(Source: Centers for Disease Control and Prevention) 
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USING POISSON DISTRIBUTIONS 
AS QUEUING MODELS 


Queuing means waiting in line to be served. There are many 
examples of queuing in everyday life: waiting at a traffic light, waiting 
in line at a grocery checkout counter, waiting for an elevator, holding 


for a telephone call, and so on. 


Poisson distributions are used to model and predict the number aul i 
of people (calls, computer programs, vehicles) arriving at the line. In 02 4 6 8 101214161820 
the following exercises, you are asked to use Poisson distributions to Number of arrivals per minute 


analyze the queues at a grocery store checkout counter. 


M@ EXERCISES 


In Exercises 1-7, consider a grocery store that can 4. 


process a total of four customers at its checkout 
counters each minute. 


1. Suppose that the mean number of customers 


who arrive at the checkout counters each 5. 


minute is 4. Create a Poisson distribution with 
pw = 4 for x = 0 to 20. Compare your results 
with the histogram shown at the upper right. 


2. MINITAB was used to generate 20 random 


numbers with a Poisson distribution for w = 4. 6. 


Let the random number represent the number 
of arrivals at the checkout counter each minute 


for 20 minutes. al 


Sys) se) oe eye) ef 8) 0) 
3 2 © 3 4d @ 2 yw 4 il 


During each of the first four minutes, only 
three customers arrived. These customers could 
all be processed, so there were no customers 
waiting after four minutes. 


(a) How many customers were waiting after 
5 minutes? 6 minutes? 7 minutes? 8 minutes? 


(b) Create a table that shows the number of 
customers waiting at the end of 1 through 
20 minutes. 


3. Generate a list of 20 random numbers with a 
Poisson distribution for w = 4. Create a table 
that shows the number of customers waiting at 
the end of 1 through 20 minutes. 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 


TECHNOLOGY 


0.14 


Probability 


Suppose that the mean increases to 5 arrivals 
per minute. You can still process only four per 
minute. How many would you expect to be 
waiting in line after 20 minutes? 


Simulate the setting in Exercise 4. Do this by 
generating a list of 20 random numbers with a 
Poisson distribution for ~ = 5. Then create a 
table that shows the number of customers 
waiting at the end of 20 minutes. 


Suppose that the mean number of arrivals 
per minute is 5. What is the probability that 
10 customers will arrive during the first minute? 


. Suppose that the mean number of arrivals per 


minute is 4. 


(a) What is the probability that three, four, or 
five customers will arrive during the third 
minute? 

(b) What is the probability that more than four 
customers will arrive during the first 
minute? 

(c) What is the probability that more than four 
customers will arrive during each of the 
first four minutes? 
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TI-83/84 PLUS 
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NORMAL 
PROBABILITY 
DISTRIBUTIONS 


5.1. Introduction to 
Normal Distributions 
and the Standard 
Normal Distribution 

5.2 Normal Distributions: 
Finding Probabilities 

5.3 Normal Distributions: 
Finding Values 
@ CASE STUDY 

5.4 Sampling Distributions 
and the Central Limit 
Theorem 


@ ACTIVITY 
5.5 Normal Approximations 

to Binomial Distributions 

m™@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


m@ TECHNOLOGY 


The bottom shell of an Eastern Box Turtle has 
hinges so the turtle can retract its head, tail, 
and legs into the shell. The shell can also 
regenerate if it has been damaged. 
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«€ WHERE YOU'VE BEEN 


In Chapters 1 through 4, you learned how to 
collect and describe data, find the probability of 
an event, and analyze discrete probability 
distributions. You also learned that if a sample is 
used to make inferences about a population, then 
it is critical that the sample not be biased. Suppose, 
for instance, that you wanted to determine the rate 
of clinical mastitis (infections caused by bacteria 
that can alter milk production) in dairy herds. How 


WHERE YOU’RE GOING p> 


In Chapter 5, you will learn how to recognize 
normal (bell-shaped) distributions and how to 
use their properties in real-life applications. 
Suppose that you worked for the North Carolina 
Zoo and were collecting data about various 
physical traits of Eastern Box Turtles at the zoo. 
Which of the following would you expect to have 
bell-shaped, symmetric distributions: carapace 


Female Eastern Box Turtle 
Carapace Length 


Percent 


70 90" 120" 130° «150 


Carapace length 
(in millimeters) 


Female Eastern Box Turtle 
Plastral Length 


Percent 


70 90) T0130" 156) 


Plastral length 
(in millimeters) 


would you organize the study? When the Animal 
Health Service performed this study, it used 
random sampling and then classified the results 
according to breed, housing, hygiene, health, 
milking management, and milking machine. One 
conclusion from the study was that herds with Red 
and White cows as the predominant breed had a 
higher rate of clinical mastitis than herds with 
Holstein-Friesian cows as the main breed. 


(top shell) length, plastral (bottom shell) length, 
carapace width, plastral width, weight, total 
length? For instance, the four graphs below show 
the carapace length and plastral length of male 
and female Eastern Box Turtles in the North 
Carolina Zoo. Notice that the male Eastern Box 
Turtle carapace length distribution is bell-shaped, 
but the other three distributions are skewed left. 


Male Eastern Box Turtle 
Carapace Length 


Percent 


80 100 120 140 160 


Carapace length 
(in millimeters) 


Male Eastern Box Turtle 
Plastral Length 


Percent 


7 90 110 130 


Plastral length 
(in millimeters) 
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distributions 


>» How to find areas under 
the standard normal curve 


INSIGHT 


To learn how to 
determine if a 
random sample 

is taken from a 
normal distribution, 
see Appendix C. 


INSIGHT 


A probability density 
function has two 
requirements. 


1. The total area 
under the curve 
is equal to 1. 


2. The function can 
never be negative. 
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Introduction to Normal Distributions and the 
Standard Normal Distribution 


WHAT YOU SHOULD LEARN 


> How to interpret graphs 
of normal probability 


Properties of a Normal Distribution » The Standard Normal Distribution 


> PROPERTIES OF A NORMAL DISTRIBUTION 


In Section 4.1, you distinguished between discrete and continuous random 
variables, and learned that a continuous random variable has an infinite number 
of possible values that can be represented by an interval on the number line. Its 
probability distribution is called a continuous probability distribution. In this 
chapter, you will study the most important continuous probability distribution in 
statistics—the normal distribution. Normal distributions can be used to model 
many sets of measurements in nature, industry, and business. For instance, the 
systolic blood pressures of humans, the lifetimes of plasma televisions, and even 
housing costs are all normally distributed random variables. 


DEFINITION 


A normal distribution is a continuous probability distribution for a random 
variable x. The graph of a normal distribution is called the normal curve. 
A normal distribution has the following properties. 


1. The mean, median, and mode are equal. 

2. The normal curve is bell-shaped and is symmetric about the mean. 
3. The total area under the normal curve is equal to 1. 
4 


. The normal curve approaches, but never touches, the x-axis as it extends 
farther and farther away from the mean. 


5. Between w — o and w + o (in the center of the curve), the graph curves 
downward. The graph curves upward to the left of 4 — o and to the right 
of w + a. The points at which the curve changes from curving upward to 
curving downward are called inflection points. 


Inflection points 


Total area = 1 


) 
1 
I) 
i 
i) 
iH 
iV 
{) 
1 iV ip 
1 1 1 
il {i 1 
it fl 
T 


<= 


t = X 


mt L L 
~U-30 wM-20 p-o 


'e 4 eS ee ee Ne eee 


fi Ps 
Oo w+20 Ut+3o 


You have learned that a discrete probability distribution can be graphed with 
a histogram. For a continuous probability distribution, you can use a probability 
density function (pdf). A normal curve with mean yp and standard deviation 7 
can be graphed using the normal probability density function. 


1 


eee A normal curve depends completely 
oV 2a on the two parameters w and o because 
e © 2.718 and w ~ 3.14 are constants. 


y= 
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A normal distribution can have any mean and any positive standard 
STUDY TIP a : 
. eet deviation. These two parameters, w and a, completely determine the shape of 
PSU EUS Unga allele tos the normal curve. The mean gives the location of the line of symmetry, and the 


graphing a normal distribution 
on a TI-83/84 Plus. 


Y= [2nd] DISTR 


Inflection F 
1: normalpdf( lafisetion Inflection 
points 
Enter x and the values points 
of w and o separated 
by commas. 
# x 


standard deviation describes how much the data are spread out. 


1234567 01234567 01234567 
Mean: pw = 3.5 Mean: pw = 3.5 Mean: mw = 1.5 
Standard deviation: Standard deviation: Standard deviation: 
a0=15 ao = 0.7 ao = 0.7 


Notice that curve A and curve B above have the same mean, and curve B 
and curve C have the same standard deviation. The total area under each curve 
is 1. 


EXAMPLE 1 


» Understanding Mean and Standard Deviation 
1. Which normal curve has a greater mean? 


2. Which normal curve has a greater standard deviation? 


> Solution 


1. The line of symmetry of curve A occurs at x = 15. The line of symmetry 
of curve B occurs at x = 12. So, curve A has a greater mean. 


2. Curve B is more spread out than curve A. So, curve B has a greater 
standard deviation. 


> Try It Yourself 1 


Consider the normal curves shown at the right. 

Which normal curve has the greatest mean? 

Which normal curve has the greatest standard A 
deviation? 


a. Find the location of the line of symmetry of B 
each curve. Make a conclusion about which 
mean is greatest. C 
b. Determine which normal curve is more spread 
out. Make a conclusion about which standard 
deviation is greatest. Answer: Page A37 30 40 50 60 «70 
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Floti Flot Flot 
“4 Brormaledf tes 
be be 


0 


Once you determine the mean and 
standard deviation, you can use a 
TI-83/84 Plus to graph the normal 
curve in Example 2. 


According to one publication, 
the number of births in the 
United States in a recent year 
was 4,317,000. The weights 
of the newborns can be 
approximated by a normal 
distribution, as shown by the 
following graph. (Adapied from 
National Center for Health Statistics) 


Weights of Newborns 


i i 
T T 
S S 
So S 
| ies oa 
a fog) 


Weight (in grams) 


What is the mean weight of 
the newborns? Estimate the 
standard deviation of this 
normal distribution. 


EXAMPLE 2 


> Interpreting Graphs of Normal Distributions 


The scaled test scores for the New York State Grade 8 Mathematics Test 
are normally distributed. The normal curve shown below represents this 
distribution. What is the mean test score? Estimate the standard deviation of 
this normal distribution. (Adapted from New York State Education Department) 


li T T T 
550 600 650 700 750 800 


Scaled test score 


> Solution 


Because a normal curve is 
symmetric about the mean, you 
can estimate that = 675. 


Because the inflection points 
are one standard deviation from 
the mean, you can estimate that 
o~ 35, 


1 
1 
1 
1 
i] 
| 
i i 
T 


i 
T T T T 
550 600 650 700 750 800 


Scaled test score 


Interpretation The scaled test scores for the New York State Grade 8 
Mathematics Test are normally distributed with a mean of about 675 and a 
standard deviation of about 35. 


> Try It Yourself 2 


The scaled test scores for the New York State Grade 8 English Language Arts 
Test are normally distributed. The normal curve shown below represents this 
distribution. What is the mean test score? Estimate the standard deviation of 
this normal distribution. (Adapted from New York State Education Department) 


f | | | | | } | | \ \ | I | n x 
550 600 650 700 750 


Scaled test score 


a. Find the line of symmetry and identify the mean. 
b. Estimate the inflection points and identify the standard deviation. 
Answer: Page A37 
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INSIGHT 


Because every normal distribution 


can be transformed to the 
standard normal 
distribution, you 

can use z-scores 

and the standard 

normal curve to find 
areas (and therefore 
probabilities) under 

any normal curve. 


STUDY TIP 


It is important that you know 
the difference between x and z. 


The random variable x is 


sometimes called a raw score 


and represents values in 
a nonstandard normal 
distribution, whereas 

Z represents values 

in the standard 

normal distribution. 


¢ 


> THE STANDARD NORMAL DISTRIBUTION 


There are infinitely many normal distributions, each with its own mean and 
standard deviation. The normal distribution with a mean of 0 and a standard 
deviation of 1 is called the standard normal distribution. The horizontal scale of 
the graph of the standard normal distribution corresponds to z-scores. In Section 
2.5, you learned that a z-score is a measure of position that indicates the number 
of standard deviations a value lies from the mean. Recall that you can transform 
an x-value to a z-score using the formula 


_ Value — Mean 
Standard deviation 


x= 
= ag Round to the nearest hundredth. 
o 


DEFINITION 


The standard normal distribution is a normal distribution with a mean of 0 and 
a standard deviation of 1. 


Standard Normal Distribution 


If each data value of a normally distributed random variable x is transformed 
into a z-score, the result will be the standard normal distribution. When this 
transformation takes place, the area that falls in the interval under the nonstan- 
dard normal curve is the same as that under the standard normal curve within the 
corresponding z-boundaries. 

In Section 2.4, you learned to use the Empirical Rule to approximate areas 
under a normal curve when the values of the random variable x corresponded to 
—3, —2, -1, 0,1, 2, or 3 standard deviations from the mean. Now, you will learn 
to calculate areas corresponding to other x-values. After you use the formula 
given above to transform an x-value to a z-score, you can use the Standard 
Normal Table in Appendix B. The table lists the cumulative area under the 
standard normal curve to the left of z for z-scores from —3.49 to 3.49. As you 
examine the table, notice the following. 


PROPERTIES OF THE STANDARD 
NORMAL DISTRIBUTION 


1. The cumulative area is close to 0 for z-scores close to z = —3.49. 
2. The cumulative area increases as the z-scores increase. 

3. The cumulative area for z = 0 is 0.5000. 

4. The cumulative area is close to 1 for z-scores close to z = 3.49. 
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STUDY TIP 


Here are instructions for finding 
the area that corresponds to 
Zz = —0.24 on a TI-83/84 Plus. 


To specify the lower bound 
in this case, use — 10,000. 


2nd] DISTR 

2: normalcdf( 
—10000, —.24|) 
ENTER 


cde - 18688 
2465165175 


EXAMPLE 3 


» Using the Standard Normal Table 
1. Find the cumulative area that corresponds to a z-score of 1.15. 


2. Find the cumulative area that corresponds to a z-score of —0.24. 


> Solution 


1. Find the area that corresponds to z = 1.15 by finding 1.1 in the left column 
and then moving across the row to the column under 0.05. The number in 
that row and column is 0.8749. So, the area to the left of z = 1.15 is 0.8749. 


z -00 01 -02 -03 .04 -06 
0.0 5000 5040 5080 5120 .5160 
0.1 mek) evilei) — ee7A3} II! SSS 
0.2 5793 5832 = -.5871 5910  .5948 


0.9 8159 8186 8212 8238 8264 8315 
1.0 8413 8438 8461 8485 8508 8554 
ean ee 366 686 870! ( ) .8770 
1.2 8849 8869 8888 8907 8925 . 8962 
1.3 9032 9049 9066 §= 9082. 90999115. .91311 
1.4 9192 9207 9222 9236 ~— 925192659279 


2. Find the area that corresponds to z = —0.24 by finding —0.2 in the left 
column and then moving across the row to the column under 0.04. The 
number in that row and column is 0.4052. So, the area to the left of 
z = —0.24 is 0.4052. 


4 -09 .08 -07 06 05 
—3.4 .0002 .0003 0003 ~=.0003 .0003 .0003 
—3.3 0003 .0004 .0004 .0004 .0004 .0004 
—3.2 0005 ~=—.0005 0005 .0006 .0006 .0006 
—0.5 :2// O20 ON 2 SAS ZO, 2912) 294 2981 
—0.4 3121 3156 =.3192 = .3228 3264 3300 86.3336 
—0.3 3483 = .35202) 3557) = .3594 3632 3707 
0.2 | 3859 3897 3936) 3974 = 4013. 4052) +4090 
—0.1 4247 4286 4325 4364 4404 4443 ~.4483 
—0.0 4641 4681 4721 4761 4801 4840 _.4880 


You can also use a computer or calculator to find the cumulative area that 
corresponds to a z-score, as shown in the margin. 


> Try It Yourself 3 
1. Find the cumulative area that corresponds to a z-score of —2.19. 


2. Find the cumulative area that corresponds to a z-score of 2.17. 


Locate the given z-score and find the area that corresponds to it in the 
Standard Normal Table. Answer: Page A37 


When the z-score is not in the table, use the entry closest to it. If the given 
z-score is exactly midway between two z-scores, then use the area midway 
between the corresponding areas. 
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You can use the following guidelines to find various types of areas under the 
standard normal curve. 


GUIDELINES 


Finding Areas Under the Standard Normal Curve 

1. Sketch the standard normal curve and shade the appropriate area under 
the curve. 

2. Find the area by following the directions for each case shown. 


a. To find the area to the Jeft of z, find the area that corresponds to z in 
the Standard Normal Table. 


2. The area to the 
lefittotec—sle2 sma) 


1. Use the table to e 


find the area for the z-score. 


b. To find the area to the right of z, use the Standard Normal Table to 
find the area that corresponds to z. Then subtract the area from 1. 


3. Subtract to find the area 
to the right of z = 1.23: 
1 — 0.8907 = 0.1093. 


2. The area to the left 
of z = 1.23 is 0.8907. 


0 1.23 


1. Use the table to 
find the area for the z-score. 


c. To find the area between two z-scores, find the area corresponding to 
each z-score in the Standard Normal Table. Then subtract the smaller 
area from the larger area. 


2. The area to the left 4. Subtract to find the area 
of z = 1.23 is 0.8907. of the region between the 

two z-scores: 

0.8907 — 0.2266 = 0.6641. 


3. The area to the left 
of z =—0.75 is 0.2266. 


-0.75 0 
\ 
1. Use the table to find 


S 


the areas for the z-scores. 
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norms : cdf ¢ -1 eee 
. 1616576612 


3 


Using a TI-83/84 Plus, you can find 
the area automatically. 


INSIGHT 


Because the normal distribution 
is a continuous probability 
distribution, the area under the 
standard normal curve to the left 
of a z-score gives the probability 
that z is less than that 
z-score. For instance, in 
Example 4, the area to 
the left of z = —0.99 

is 0.1611. So, 

P(z < —0.99) = 0.1611, 
which is read as “the 
probability that z is 

less than —0.99 is 0.1611.” 


ers 1.86: 1 
1445723274 


Morr 
BEBE 


Use 10,000 for the upper bound. 
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EXAMPLE 4 


» Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve to the left of z = —0.99. 
> Solution 
The area under the standard normal curve to the left of z = —0.99 is shown. 


nN 


| | 
i 
—0.99 0 


From the Standard Normal Table, this area is equal to 0.1611. 
> Try It Yourself 4 
Find the area under the standard normal curve to the left of z = 2.13. 


a. Draw the standard normal curve and shade the area under the curve and to 
the left of z = 2.13. 

b. Use the Standard Normal Table to find the area that corresponds to 
z= 2.13. Answer: Page A38 


EXAMPLE 5 


» Finding Area Under the Standard Normal Curve 
Find the area under the standard normal curve to the right of z = 1.06. 


> Solution 
The area under the standard normal curve to the right of z = 1.06 is shown. 


Area = 0.8554 Area = | — 0.8554 


| re 
T 
0 1.06 


From the Standard Normal Table, the area to the left of z= 1.06 is 
0.8554. Because the total area under the curve is 1, the area to the right 
of z = 1.06 is 


Area = 1 — 0.8554 
= 0.1446. 
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normalodf<-1.5:1 
Woe Pode 9525 


When using technology, your 
answers may differ slightly from 
those found using the Standard 
Normal Table. 


> Try It Yourself 5 
Find the area under the standard normal curve to the right of z = —2.16. 


a. Draw the standard normal curve and shade the area below the curve and to 
the right of z = —2.16. 

b. Use the Standard Normal Table to find the area to the left of z = —2.16. 

c. Subtract the area from 1. Answer: Page A38 


EXAMPLE 6 


> Finding Area Under the Standard Normal Curve 
Find the area under the standard normal curve between z = —1.5 and z = 1.25. 


> Solution 


The area under the standard normal curve between z = —1.5 and z = 1.25 is 
shown. 


-1.5 0 1.25 


From the Standard Normal Table, the area to the left of z = 1.25 is 0.8944 and 
the area to the left of z = —1.5 is 0.0668. So, the area between z = —1.5 and 
z= 1.25 is 
Area = 0.8944 — 0.0668 
= 0.8276. 


Interpretation So, 82.76% of the area under the curve falls between 
z= —1.5 and z = 1.25. 


> Try It Yourself 6 


Find the area under the standard normal curve between z = —2.165 and 
z= —1.35. 


a. Use the Standard Normal Table to find the area to the left of z = —1.35. 
b. Use the Standard Normal Table to find the area to the left of z = —2.165. 
c. Subtract the smaller area from the larger area. 

d. Interpret the results. Answer: Page A38 


Recall that in Section 2.4 you learned, using the Empirical Rule, that values 
lying more than two standard deviations from the mean are considered unusual. 
Values lying more than three standard deviations from the mean are considered 
very unusual. So, if a z-score is greater than 2 or less than —2, it is unusual. If a 
z-score is greater than 3 or less than —3, it is very unusual. 
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PD Exercises 


ct 


FOR EXTRA HELP; 


7 
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NORMAL PROBABILITY DISTRIBUTIONS 


HM BUILDING BASIC SKILLS AND VOCABULARY 
i. 


Find three real-life examples of a continuous variable. Which do you think 
may be normally distributed? Why? 


. Ina normal distribution, which is greater, the mean or the median? Explain. 


3. What is the total area under the normal curve? 


. What do the inflection points on a normal distribution represent? Where do 


they occur? 


.- Draw two normal curves that have the same mean but different standard 


deviations. Describe the similarities and differences. 


. Draw two normal curves that have different means but the same standard 


deviation. Describe the similarities and differences. 


. What is the mean of the standard normal distribution? What is the standard 


deviation of the standard normal distribution? 


. Describe how you can transform a nonstandard normal distribution to a 


standard normal distribution. 


. Getting at the Concept Why is it correct to say “a” normal distribution and 


“the” standard normal distribution? 


. Getting at the Concept If a z-score is 0, which of the following must be 


true? Explain your reasoning. 

(a) The mean is 0. 

(b) The corresponding x-value is 0. 

(c) The corresponding x-value is equal to the mean. 


Graphical Analysis Jn Exercises 11-16, determine whether the graph could 
represent a variable with a normal distribution. Explain your reasoning. 


11. 


12. 
~ >X 
~< >X 
14. 
ne iY 
16. 
o >X 
~ >X 
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Graphical Analysis Jn Exercises 17 and 18, determine whether the histogram 
represents data with a normal distribution. Explain your reasoning. 


17. Waiting Time in a 18. Weight Loss 
Dentist’s Office 
= 0.20 
[=| 
al oO 
2 Ss 0.15 
=! & 
a 0.10 
& = 
2 S 0.05-+- 
3 ~ 
i) 
% 10 20 30 40 50 60 70 80 


Pounds lost 


Time (in minutes) 


M@ USING AND INTERPRETING CONCEPTS 


Graphical Analysis In Exercises 19-24, find the area of the indicated region 
under the standard normal curve. If convenient, use technology to find the area. 


19. 20. 


| 
a 
We 
i=) 
o 
— 
nN 


> 
> 


21. 22. 


P 
> 


o 
i) 
| 
NX 
bo 
o 


23. 24. 


> 
> 


=2.25 0 -0.5 0 1s 


Finding Area Jn Exercises 25-38, find the indicated area under the standard 
normal curve. If convenient, use technology to find the area. 


25. To the left of z = 0.08 26. To the right of z = —3.16 

27. To the left of z = —2.575 28. To the left of z = 1.365 

29. To the right of z = —0.65 30. To the right of z = 3.25 

31. To the right of z = —0.355 32. To the right of z = 1.615 

33. Between z = 0 and z = 2.86 34. Between z = —1.53 and z = 0 


35. Between z = —1.96 and z = 1.96 
36. Between z = —2.33 and z = 2.33 
37. To the left of z = —1.28 and to the right of z = 1.28 
38. To the left of z = —1.96 and to the right of z = 1.96 
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NORMAL PROBABILITY DISTRIBUTIONS 


39. Manufacturer Claims You work for a consumer watchdog publication and 
are testing the advertising claims of a tire manufacturer. The manufacturer 
claims that the life spans of the tires are normally distributed, with a mean of 
40,000 miles and a standard deviation of 4000 miles. You test 16 tires and get 
the following life spans. 


48,778 41,046 29,083 36,394 32,302 42,787 41,972 37,229 
25,314 31,920 38,030 38,445 30,750 38,886 36,770 46,049 


(a) Draw a frequency histogram to display these data. Use five classes. Is it 
reasonable to assume that the life spans are normally distributed? Why? 


(b) Find the mean and standard deviation of your sample. 


(c) Compare the mean and standard deviation of your sample with those in 
the manufacturer’s claim. Discuss the differences. 


*" 40. Milk Consumption ‘You are performing a study about weekly per 

capita milk consumption. A previous study found weekly per capita 
milk consumption to be normally distributed, with a mean of 48.7 fluid 
ounces and a standard deviation of 8.6 fluid ounces. You randomly 
sample 30 people and find their weekly milk consumptions to be as 
follows. 


40 45 54 41 43 31 47 30 33 37 48 57 52 45 38 
65 25 39 53 51 58 52 40 46 44 48 61 47 49 57 


(a) Draw a frequency histogram to display these data. Use seven 
classes. Is it reasonable to assume that the consumptions are 
normally distributed? Why? 


(b) Find the mean and standard deviation of your sample. 


(c) Compare the mean and standard deviation of your sample with 
those of the previous study. Discuss the differences. 


Computing and Interpreting z-Scores of Normal Distributions Jn 
Exercises 41—44, you are given a normal distribution, the distribution’s mean and 
standard deviation, four values from that distribution, and a graph of the standard 
normal distribution. (a) Without converting to z-scores, match the values with the 
letters A, B, C, and D on the given graph of the standard normal distribution. 
(b) Find the z-score that corresponds to each value and check your answers to 
part (a). (c) Determine whether any of the values are unusual. 


41. Blood Pressure ‘The systolic blood pressures of a sample of adults are 
normally distributed, with a mean pressure of 115 millimeters of mercury 
and a standard deviation of 3.6 millimeters of mercury. The systolic blood 
pressures of four adults selected at random are 121 millimeters of mercury, 
113 millimeters of mercury, 105 millimeters of mercury, and 127 millimeters 
of mercury. 


> 
> 
Q—> 
e 
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42. 


43. 


AA, 


Cereal Boxes The weights of the contents of cereal boxes are normally 
distributed, with a mean weight of 12 ounces and a standard deviation of 
0.05 ounce. The weights of the contents of four cereal boxes selected at 
random are 12.01 ounces, 11.92 ounces, 12.12 ounces, and 11.99 ounces. 


SJ \. 


t t 
A 


t 
BC D 


SAT Scores The SAT is an exam used by colleges and universities to 
evaluate undergraduate applicants. The test scores are normally distributed. 
In a recent year, the mean test score was 1509 and the standard deviation was 
312. The test scores of four students selected at random are 1924, 1241, 2202, 
and 1392. (Source: The College Board) 


a > rf 4 f 
AB Cc B A B Cc D 
FIGURE FOR EXERCISE 43 FIGURE FOR EXERCISE 44 


ACT Scores The ACT is an exam used by colleges and universities to 
evaluate undergraduate applicants. The test scores are normally distributed. 
In a recent year, the mean test score was 21.1 and the standard deviation was 
5.0. The test scores of four students selected at random are 15, 22, 9, and 35. 
(Source: ACT, Inc.) 


Graphical Analysis Jn Exercises 45-50, find the probability of z occurring in 
the indicated region. If convenient, use technology to find the probability. 


45. 


49. 
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BILITY DISTRIBUTIONS 


Finding Probabilities Jn Exercises 51-60, find the indicated probability 
using the standard normal distribution. If convenient, use technology to find the 


probability. 

51. P(z < 1.45) 52. P(z < —0.18) 53. P(z > 2.175) 

54. P(z > —1.85) 55. P(—0.89 < z < 0) 56. P(O < z < 0.525) 
57. P(-1.65 < z < 1.65) 58. P(—1.54 < z < 1.54) 

59. P(z < —2.58 or z > 2.58) 60. P(z < —1.54 or z > 1.54) 


61. 


EXTENDING CONCEPTS 


Writing Draw a normal curve with a mean of 60 and a standard deviation 
of 12. Describe how you constructed the curve and discuss its features. 


. Writing Draw anormal curve with a mean of 450 and a standard deviation 


of 50. Describe how you constructed the curve and discuss its features. 


. Uniform Distribution Another continuous distribution is the uniform 


distribution. An example is f(x) = 1 for 0 = x = 1. The mean of the 
distribution for this example is 0.5 and the standard deviation is 
approximately 0.29. The graph of the distribution for this example is a square 
with the height and width both equal to 1 unit. In general, the density 
function for a uniform distribution on the interval from x = a to x = bis 
given by 


The mean is 


at+b 
2 


and the standard deviation is 
(pay 
12 


f(x) 


(a) Verify that the area under the curve is 1. 
(b) Find the probability that x falls between 0.25 and 0.5. 
(c) Find the probability that x falls between 0.3 and 0.7. 


. Uniform Distribution Consider the uniform density function f(x) = 0.1 


for 10 = x = 20. The mean of this distribution is 15 and the standard 
deviation is about 2.89. 


(a) Draw a graph of the distribution and show that the area under the curve 
is 1. 

(b) Find the probability that x falls between 12 and 15. 

(c) Find the probability that x falls between 13 and 18. 
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WHAT YOU SHOULD LEARN 


> How to find probabilities for 
normally distributed variables 
using a table and using 
technology 


600 700 800 


u=0 


In Example 1, you can use a 
TI-83/84 Plus to find the probability 


automatically. 


STUDY TIP 


Another way to write 
the answer to Example 1 
is P(x < 1) = 0.0228. 
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Normal Distributions: Finding Probabilities 


Probability and Normal Distributions 


> PROBABILITY AND NORMAL DISTRIBUTIONS 


If a random variable x is normally distributed, you can find the probability that x 
will fall in a given interval by calculating the area under the normal curve for the 
given interval. To find the area under any normal curve, you can first convert the 
upper and lower bounds of the interval to z-scores. Then use the standard normal 
distribution to find the area. For instance, consider a normal curve with w = 500 
and a = 100, as shown at the upper left. The value of x one standard deviation 
above the mean is 4 + 0 = 500 + 100 = 600. Now consider the standard 
normal curve shown at the lower left. The value of z one standard deviation 
above the meanis uw + 0 = 0 + 1 = 1. Because a z-score of 1 corresponds to an 
x-value of 600, and areas are not changed with a transformation to a standard 
normal curve, the shaded areas in the graphs are equal. 


EXAMPLE 1 G@® Report 20 


> Finding Probabilities for Normal Distributions 
A survey indicates that people use their cellular phones an average of 1.5 years 
before buying a new one. The standard deviation is 0.25 year. A cellular phone 
user is selected at random. Find the probability that the user will use their 
current phone for less than 1 year before buying a new one. Assume that the 
variable x is normally distributed. (Adapted from Fonebak) 


> Solution 

The graph shows a normal curve with w = 1.5 
and o0 = 0.25 and a shaded area for x less than 1. 
The z-score that corresponds to 1 year is 


x7>eM_ 1-15 
o 0.25 


The Standard Normal Table shows that 
P(z< —2) = 0.0228. The probability that the 
user will use their cellular phone for less than 
1 year before buying a new one is 0.0228. 


= = —2. 


Age of cellular phone (in years) 


Interpretation So,2.28% of cellular phone users will use their cellular phone 
for less than 1 year before buying a new one. Because 2.28% is less than 5%, 
this is an unusual event. 


> Try It Yourself 1 


The average speed of vehicles traveling on a stretch of highway is 67 miles per 
hour with a standard deviation of 3.5 miles per hour. A vehicle is selected at 
random. What is the probability that it is violating the 70 mile per hour speed 
limit? Assume the speeds are normally distributed. 


a. Sketch a graph. 
b. Find the z-score that corresponds to 70 miles per hour. 
c. Find the area to the right of that z-score. 


d. Interpret the results. Answer: Page A38 
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EXAMPLE 2 


> Finding Probabilities for Normal Distributions 


A survey indicates that for each trip to the supermarket, a shopper spends an 
average of 45 minutes with a standard deviation of 12 minutes in the store. The 
lengths of time spent in the store are normally distributed and are represented 
by the variable x. A shopper enters the store. (a) Find the probability that the 
shopper will be in the store for each interval of time listed below. (b) Interpret 
your answer if 200 shoppers enter the store. How many shoppers would you 
expect to be in the store for each interval of time listed below? 


1. Between 24 and 54 minutes 2. More than 39 minutes 


> Solution 


1. (a) The graph at the left shows a normal curve with w = 45 minutes and 
o = 12 minutes. The area for x between 24 and 54 minutes is shaded. 
The z-scores that correspond to 24 minutes and to 54 minutes are 
24 — 45 54 — 45 
4 ~ 4 = —-1.75 and m2 a aa = 0.75. 
So, the probability that a shopper will be in the store between 24 and 
54 minutes is 


10 20 30 40 50 60 70 80 
Time (in minutes) P(24 < x < 54) = P(-1.75 < z < 0.75) 


= P(z < 0.75) — P(z < —1.75) 
= 0.7734 — 0.0401 = 0.7333. 


(b) Interpretation If 200 shoppers enter the store, then you would expect 
200(0.7333) = 146.66, or about 147, shoppers to be in the store between 
24 and 54 minutes. 


2. (a) The graph at the left shows a normal curve with » = 45 minutes and 
o = 12 minutes. The area for x greater than 39 minutes is shaded. The 
z-score that corresponds to 39 minutes is 


39 — 45 
Lo = 


D —0.5. 


So, the probability that a shopper will be in the store more than 
- 39 minutes is 


10 20 30 40 50 60 70 80 
Time (in minutes) P(x > 39) = P(z> —0.5) = 1— P(z < —0.5) = 1— 0.3085 = 0.6915. 


(b) Interpretation If 200 shoppers enter the store, then you would expect 
200(0.6915) = 138.3, or about 138, shoppers to be in the store more 
than 39 minutes. 


> Try It Yourself 2 


What is the probability that the shopper in Example 2 will be in the super- 
market between 33 and 60 minutes? 


a. Sketch a graph. 
b. Find the z-scores that correspond to 33 minutes and 60 minutes. 
c. Find the cumulative area for each z-score and subtract the smaller area from 
the larger area. 
d. Interpret your answer if 150 shoppers enter the store. How many shoppers 
would you expect to be in the store between 33 and 60 minutes? 
Answer: Page A38 
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In baseball, a batting average 

is the number of hits divided 

by the number of at-bats. The 
batting averages of all Major 
League Baseball players in a 
recent year can be approximated 
by a normal distribution, as 
shown in the following graph. 
The mean of the batting averages 
is 0.262 and the standard 
deviation is 0.009. (Adapted from ESPN) 


Major League Baseball 
U=0.262 


— 


0.24 0.25 0.26 0.27 0.28 
Batting average 


What percent of the players 
have a batting average of 
0.270 or greater? If there are 
40 players on a roster, how 
many would you expect to 
have a batting average of 
0.270 or greater? 
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Another way to find normal probabilities is to use a calculator or a computer. 


You can find normal probabilities using MINITAB, Excel, and the TI-83/84 Plus. 


EXAMPLE 3 


» Using Technology to Find Normal Probabilities 


Triglycerides are a type of fat in the bloodstream. The mean triglyceride level 
in the United States is 134 milligrams per deciliter. Assume the triglyceride 
levels of the population of the United States are normally distributed, with a 
standard deviation of 35 milligrams per deciliter. You randomly select a person 
from the United States. What is the probability that the person’s triglyceride 
level is less than 80? Use a technology tool to find the probability. (Adapted 
from University of Maryland Medical Center) 


> Solution 


MINITAB, Excel, and the TI-83/84 Plus each have features that allow you to 
find normal probabilities without first converting to standard z-scores. For 
each, you must specify the mean and standard deviation of the population, as 
well as the x-value(s) that determine the interval. 


MINITAB 


Cumulative Distribution Function 


Normal with mean = 134 and standard deviation = 35 


Xx P[X <= x] 
80 0.06143827 


A B Cc 
4_| NORMDIST(80, 134,35, TRUE) 


2 |0.06143272 
TI-83/84 PLUS 


normalcdf(-10000,80,134,35) 
0614327356 


From the displays, you can see that the probability that the person’s 
triglyceride level is less than 80 is about 0.0614, or 6.14%. 


> Try It Yourself 3 


A person from the United States is selected at random. What is the probability 
that the person’s triglyceride level is between 100 and 150? Use a technology 
tool. 


a. Read the user's guide for the technology tool you are using. 
b. Enter the appropriate data to obtain the probability. 


c. Write the result as a sentence. Answer: Page A38 


Example 3 shows only one of several ways to find normal probabilities using 


MINITAB, Excel, and the TI-83/84 Plus. 
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PED EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


Computing Probabilities /n Exercises 1-6, assume the random variable x is 
normally distributed with mean = 174 and standard deviation 0 = 20. Find the 
indicated probability. 


FOR EXTRA HELP: 
2 y | 1. P(x < 170) 
2. P(x < 200) 
3. P(x > 182) 
4. P(x > 155) 
5. P(160 < x < 170) 
6. P(172 < x < 192) 


Graphical Analysis In Exercises 7-12, assume a member is selected at random 
from the population represented by the graph. Find the probability that the 
member selected at random is from the shaded area of the graph. Assume the 


variable x is normally distributed. 


7. SAT Writing Scores 
200 <x < 450 


200 450 800 
Score 


(Source: The College Board) 


9. U.S. Men Ages 35-44: 
Total Cholesterol 


220 <x < 255 w= 209 


75 220 255 300 
Total cholesterol level (in mg/dL) 


(Adapted from National Center 
for Health Statistics) 


11. Ford Fusion: 
Braking Distance 


145<x< 155 


122 145 155 
Braking distance (in feet) 


(Adapted from Consumer Reports) 
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8. SAT Math Scores 


670 <x < 800 
T 

200 670 800 

Score 


(Source: The College Board) 


10. U.S. Women Ages 35-44: 
Total Cholesterol 


190 <x<215 


100 190 ‘215 300 
Total cholesterol level (in mg/dL) 


(Adapted from National Center 
for Health Statistics) 


12. Hyundai Elantra: 
Braking Distance 


116 125 145 
Braking distance (in feet) 


(Adapted from Consumer Reports) 
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M@ USING AND INTERPRETING CONCEPTS 


Finding Probabilities nm Exercises 13-20, find the indicated probabilities. 
If convenient, use technology to find the probabilities. 


13. 


14. 


15. 


16. 


17. 


Heights of Men A survey was conducted to measure the heights of U.S. 
men. In the survey, respondents were grouped by age. In the 20-29 age group, 
the heights were normally distributed, with a mean of 69.9 inches and a 
standard deviation of 3.0 inches. A study participant is randomly selected. 
(Adapted from U.S. National Center for Health Statistics) 

(a) Find the probability that his height is less than 66 inches. 

(b) Find the probability that his height is between 66 and 72 inches. 

(c) Find the probability that his height is more than 72 inches. 

(d) Can any of these events be considered unusual? Explain your reasoning. 


Heights of Women A survey was conducted to measure the heights of U.S. 
women. In the survey, respondents were grouped by age. In the 20-29 age 
group, the heights were normally distributed, with a mean of 64.3 inches and 
a standard deviation of 2.6 inches. A study participant is randomly selected. 
(Adapted from U.S. National Center for Health Statistics) 

(a) Find the probability that her height is less than 56.5 inches. 

(b) Find the probability that her height is between 61 and 67 inches. 

(c) Find the probability that her height is more than 70.5 inches. 

(d) Can any of these events be considered unusual? Explain your reasoning. 


ACT English Scores In a recent year, the ACT scores for the English 
portion of the test were normally distributed, with a mean of 20.6 and a 
standard deviation of 6.3. A high school student who took the English 
portion of the ACT is randomly selected. (Source: ACT, Inc.) 

(a) Find the probability that the student’s ACT score is less than 15. 

(b) Find the probability that the student’s ACT score is between 18 and 25. 
(c) Find the probability that the student’s ACT score is more than 34. 


(d) Can any of these events be considered unusual? Explain your reasoning. 


Beagles The weights of adult male beagles are normally distributed, with 
a mean of 25 pounds and a standard deviation of 3 pounds. A beagle is 
randomly selected. 

(a) Find the probability that the beagle’s weight is less than 23 pounds. 

(b) Find the probability that the weight is between 24.5 and 25 pounds. 

(c) Find the probability that the beagle’s weight is more than 30 pounds. 


(d) Can any of these events be considered unusual? Explain your reasoning. 


Computer Usage A survey was conducted to measure the number of hours 
per week adults in the United States spend on their computers. In the 
survey, the numbers of hours were normally distributed, with a mean of 7 hours 
and a standard deviation of 1 hour. A survey participant is randomly selected. 


(a) Find the probability that the number of hours spent on the computer by 
the participant is less than 5 hours per week. 


(b) Find the probability that the number of hours spent on the computer by 
the participant is between 5.5 and 9.5 hours per week. 

(c) Find the probability that the number of hours spent on the computer by 
the participant is more than 10 hours per week. 
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18. 


Utility Bills The monthly utility bills in a city are normally distributed, with 
a mean of $100 and a standard deviation of $12. A utility bill is randomly 
selected. 

(a) Find the probability that the utility bill is less than $70. 

(b) Find the probability that the utility bill is between $90 and $120. 

(c) Find the probability that the utility bill is more than $140. 


. Computer Lab Schedule The times per week a student uses a lab computer 


are normally distributed, with a mean of 6.2 hours and a standard deviation 
of 0.9 hour. A student is randomly selected. 


(a) Find the probability that the student uses a lab computer less than 
4 hours per week. 


(b) Find the probability that the student uses a lab computer between 5 and 
7 hours per week. 


(c) Find the probability that the student uses a lab computer more than 
8 hours per week. 


. Health Club Schedule The times per workout an athlete uses a stairclimber 


are normally distributed, with a mean of 20 minutes and a standard deviation 
of 5 minutes. An athlete is randomly selected. 


(a) Find the probability that the athlete uses a stairclimber for less than 
17 minutes. 


(b) Find the probability that the athlete uses a stairclimber between 
20 and 28 minutes. 


(c) Find the probability that the athlete uses a stairclimber for more than 
30 minutes. 


Using Normal Distributions Jn Exercises 21-28, answer the questions about 
the specified normal distribution. 


21. 


SAT Writing Scores Use the normal distribution of SAT writing scores in 
Exercise 7 for which the mean is 493 and the standard deviation is 111. 
(a) What percent of the SAT writing scores are less than 600? 


(b) If 1000 SAT writing scores are randomly selected, about how many 
would you expect to be greater than 550? 


- SAT Math Scores Use the normal distribution of SAT math scores in 


Exercise 8 for which the mean is 515 and the standard deviation is 116. 


(a) What percent of the SAT math scores are less than 500? 


(b) If 1500 SAT math scores are randomly selected, about how many would 
you expect to be greater than 600? 


. Cholesterol Use the normal distribution of men’s total cholesterol 


levels in Exercise 9 for which the mean is 209 milligrams per deciliter and the 
standard deviation is 37.8 milligrams per deciliter. 


(a) What percent of the men have a total cholesterol level less than 
225 milligrams per deciliter of blood? 


(b) If 250 U.S. men in the 35-44 age group are randomly selected, about 
how many would you expect to have a total cholesterol level greater 
than 260 milligrams per deciliter of blood? 
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24. 


25. 


26. 


27. 


28. 
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Cholesterol Use the normal distribution of women’s total cholesterol 
levels in Exercise 10 for which the mean is 197 milligrams per deciliter and 
the standard deviation is 37.7 milligrams per deciliter. 


(a) What percent of the women have a total cholesterol level less than 
217 milligrams per deciliter of blood? 


(b) If 200 U.S. women in the 35-44 age group are randomly selected, about 
how many would you expect to have a total cholesterol level greater 
than 185 milligrams per deciliter of blood? 


Computer Usage Use the normal distribution of computer usage in Exercise 17 
for which the mean is 7 hours and the standard deviation is 1 hour. 


(a) What percent of the adults spend more than 4 hours per week on their 
computer? 


(b) If35 adults in the United States are randomly selected, about how many 
would you expect to say they spend less than 5 hours per week on their 
computer? 


Utility Bills Use the normal distribution of utility bills in Exercise 18 for 
which the mean is $100 and the standard deviation is $12. 


(a) What percent of the utility bills are more than $125? 


(b) If 300 utility bills are randomly selected, about how many would you 
expect to be less than $90? 


Battery Life Spans The life spans of batteries are normally distributed, with 
a mean of 2000 hours and a standard deviation of 30 hours. What percent of 
batteries have a life span that is more than 2065 hours? Would it be unusual 
for a battery to have a life span that is more than 2065 hours? Explain your 
reasoning. 


Peanuts Assume the mean annual consumptions of peanuts are normally 
distributed, with a mean of 5.9 pounds per person and a standard deviation 
of 1.8 pounds per person. What percent of people annually consume less than 
3.1 pounds of peanuts per person? Would it be unusual for a person to 
consume less than 3.1 pounds of peanuts in a year? Explain your reasoning. 


In Exercises 29 and 30, use the StatCrunch normal calculator to find the 
indicated probabilities. 


29. 


30. 


Soft Drink Machine The amounts a soft drink machine is designed to dispense 

for each drink are normally distributed, with a mean of 12 fluid ounces and a 

standard deviation of 0.2 fluid ounce. A drink is randomly selected. 

(a) Find the probability that the drink is less than 11.9 fluid ounces. 

(b) Find the probability that the drink is between 11.8 and 11.9 fluid ounces. 

(c) Find the probability that the drink is more than 12.3 fluid ounces. Can 
this be considered an unusual event? Explain your reasoning. 


Machine Parts The thicknesses of washers produced by a machine are 
normally distributed, with a mean of 0.425 inch and a standard deviation of 
0.005 inch. A washer is randomly selected. 

(a) Find the probability that the washer is less than 0.42 inch thick. 

(b) Find the probability that the washer is between 0.40 and 0.42 inch thick. 


(c) Find the probability that the washer is more than 0.44 inch thick. Can 
this be considered an unusual event? Explain your reasoning. 
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NORMAL PROBABILITY DISTRIBUTIONS 


M@ EXTENDING CONCEPTS 


Control Charts Statistical process control (SPC) is the use of statistics to 
monitor and improve the quality of a process, such as manufacturing an engine 
part. In SPC, information about a process is gathered and used to determine if a 
process is meeting all of the specified requirements. One tool used in SPC is a 
control chart. When individual measurements of a variable x are normally 
distributed, a control chart can be used to detect processes that are possibly out of 
statistical control. Three warning signals that a control chart uses to detect a process 
that may be out of control are as follows. 


(1) A point lies beyond three standard deviations of the mean. 
(2) There are nine consecutive points that fall on one side of the mean. 


(3) At least two of three consecutive points lie more than two standard 
deviations from the mean. 


In Exercises 31-34, a control chart is shown. Each chart has horizontal lines drawn 
at the mean yp, at w + 20, and at w + 30. Determine if the process shown is in 
control or out of control. Explain. 


31. A gear has been designed to 32. A nail has been designed to have 


have a diameter of 3 inches. The 
standard deviation of the process 
is 0.2 inch. 


a length of 4 inches. The standard 
deviation of the process is 0.12 
inch. 


Gears Nails 

A A 
a 4.50 + 
B a 
S oO 
2 3 4.25 + 
& £ 
B27 = 4.00 
oy Nn) 
5 5 
A Tee il B75 

i i i i i i i 


ae gS 
1234567 8 9101112 
n 


Observation number number 


33. A liquid-dispensing machine has 34. An engine part has been 
been designed to fill bottles with designed to have a diameter of 
1 liter of liquid. The standard 55 millimeters. The standard 
deviation of the process is deviation of the process is 


0.1 liter. 0.001 millimeter. 
Liquid Dispenser Engine Part 


55.0050 


55.0025 


55.0000 


54.9975 


Liquid dispensed (in liters) 
Diameter (in millimeters) 


aoe a an a Sa a 
1234567 8 9101112 


Observation number 


Observation number 
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Normal Distributions: Finding Values 


WHAT YOU SHOULD LEARN Finding z-Scores > Transforming a z-Score to an x-Value > Finding a Specific 
Data Value for a Given Probability 
> How to find a z-score given 


the area under the normal >» FINDING z-SCORES 


curve In Section 5.2, you were given a normally distributed random variable x and you 
found the probability that x would fall in a given interval by calculating the area 
under the normal curve for the given interval. 

But what if you are given a probability and want to find a value? For 


» How to transform a z-score 
to an x-value 


> How to find a specific data instance, a university might want to know the lowest test score a student can have 
value of a normal distribution on an entrance exam and still be in the top 10%, or a medical researcher might 
given the probability want to know the cutoff values for selecting the middle 90% of patients by age. 


In this section, you will learn how to find a value given an area under a normal 
curve (or a probability), as shown in the following example. 


EXAMPLE 1 


» Finding a z-Score Given an Area 
1. Find the z-score that corresponds to a cumulative area of 0.3632. 


2. Find the z-score that has 10.75% of the distribution’s area to its right. 


> Solution 


1. Find the z-score that corresponds to an area of 0.3632 by locating 0.3632 in 
the Standard Normal Table. The values at the beginning of the corresponding 
row and at the top of the corresponding column give the z-score. For this 
area, the row value is —0.3 and the column value is 0.05. So, the z-score 


is —0.35. 
Zz -09 08 -07 .06 .04 .03 
—3.4 .0002 .0003 .0003 .0003 .0003 .0003 
—0.5 .2776 .2810 2843 .2877 2S 12 .2946 2981 
—0.4 3121 3156 3192 3228 3264 .3300 3336 
— €0.3) | 3483352035 Soe 3669 = 3707 
STUDY TIP —0.2 | .3859 3897 3936 3974 4013 4052 4090 
Here are instructions for finding 
the z-score that corresponds to a 2. Because the area to the right is 0.1075, the cumulative area is 
given area on a TI-83/84 Plus. 1 — 0.1075 = 0.8925. Find the z-score that corresponds to an area of 0.8925 
2nd| DISTR by locating 0.8925 in the Standard Normal Table. For this area, the row 
7 value is 1.2 and the column value is 0.04. So, the z-score is 1.24. 
3: invNorm( 
Enter the cumulative area. z 00 01 02 03 05 06 
ENTER 0.0 .5000 5040 .5080 5120 5199 5239 
iMunorMt. S6ae) 1.0 8413 8438 ©8461 8485 8531 8554 
. “1 9918322 F[ — 1.1 .8665 .8708 .8749 .8770 
invelornmt . S925 8849 8869 3888 8907 8925 .8944 8962 
1, 259952478 1.3 | 9032 .9049 9066 9082 9099 9115 9131 


You can also use a computer or calculator to find the z-scores that 
correspond to the given cumulative areas, as shown in the margin. 
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> Try It Yourself 1 


1. Find the z-score that has 96.16% of the distribution’s area to the right. 
2. Find the z-score for which 95% of the distribution’s area lies between 
—z and z. 


. Determine the cumulative area. 
. Locate the area in the Standard Normal Table. 
c. Find the z-score that corresponds to the area. Answer: Page A38 


ae 


STUDY TIP In Section 2.5, you learned that percentiles divide a data set into 100 equal 
In most cases, the given area parts. To find a z-score that corresponds to a percentile, you can use the Standard 
will not be found in the table, Normal Table. Recall that if a value x represents the 83rd percentile Py3, then 
so use the entry closest 83% of the data values are below x and 17% of the data values are above x. 


to it. If the given area 
is halfway between two 
area entries, use the 
z-score halfway between 
the corresponding 
z-scores. 


EXAMPLE 2 


» Finding a z-Score Given a Percentile 


Find the z-score that corresponds to each percentile. 
1. Ps 

Ps 

» Py 


Area = 0.05 Solution 


=—= wy &Y WN 


. To find the z-score that corresponds to Ps, find the z-score that corresponds 

to an area of 0.05 (see upper figure) by locating 0.05 in the Standard 
-1645 0 Normal Table. The areas closest to 0.05 in the table are 0.0495 (z = —1.65) 
and 0.0505 (z = —1.64). Because 0.05 is halfway between the two areas in 
the table, use the z-score that is halfway between —1.64 and —1.65. So, the 
z-score that corresponds to an area of 0.05 is —1.645. 


Area = 0.5 : . 
2. To find the z-score that corresponds to Ps), find the z-score that 


corresponds to an area of 0.5 (see middle figure) by locating 0.5 in the 
- Standard Normal Table. The area closest to 0.5 in the table is 0.5000, so the 
0 z-score that corresponds to an area of 0.5 is 0. 


3. To find the z-score that corresponds to Po, find the z-score that 
corresponds to an area of 0.9 (see lower figure) by locating 0.9 in the 
Standard Normal Table. The area closest to 0.9 in the table is 0.8997, so the 
z-score that corresponds to an area of 0.9 is about 1.28. 


> Try It Yourself 2 


0 1,28 Find the z-score that corresponds to each percentile. 


1. Py 2. Pr 3. Poo 


Area = 0.8997 


nN 


a. Write the percentile as an area. If necessary, draw a graph of the area to 
visualize the problem. 

b. Locate the area in the Standard Normal Table. If the area is not in the table, 
use the closest area. (See Study Tip above.) 

c. Identify the z-score that corresponds to the area. Answer: Page A38 
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> TRANSFORMING A z-SCORE TO AN x-VALUE 


Recall that to transform an x-value to a z-score, you can use the formula 


ee 
= 


Z= 


This formula gives z in terms of x. If you solve this formula for x, you get a 
new formula that gives x in terms of z. 


xp 


4 Formula for z in terms of x 
o 
Zo =~X— Multiply each side by o. 
Mt ZO =X Add yp to each side. 
x=pt zw Interchange sides. 


TRANSFORMING A z-SCORE TO AN x-VALUE 


To transform a standard z-score to a data value x in a given population, use 
the formula 


26 = fb ar Sore 


EXAMPLE 3 


» Finding an x-Value Corresponding to a z-Score 


A veterinarian records the weights of cats treated at a clinic. The weights are 
normally distributed, with a mean of 9 pounds and a standard deviation of 
2 pounds. Find the weights x corresponding to z-scores of 1.96, —0.44, and 0. 
Interpret your results. 


> Solution 


The x-value that corresponds to each standard z-score is calculated using the 
formula x = pw + zo. 


z= 1.96: x =9 + 1.96(2) = 12.92 pounds 
z= —0.44: x =9 + (—0.44)(2) = 8.12 pounds 
z=0: x = 9 + 1.96(0) = 9 pounds 


Interpretation Youcan see that 12.92 pounds is above the mean, 8.12 pounds 
is below the mean, and 9 pounds is equal to the mean. 


> Try It Yourself 3 


A veterinarian records the weights of dogs treated at a clinic. The weights are 
normally distributed, with a mean of 52 pounds and a standard deviation of 15 
pounds. Find the weights x corresponding to z-scores of —2.33, 3.10, and 0.58. 
Interpret your results. 


a. Identify 4 and o of the normal distribution. 
b. Transform each z-score to an x-value. 
c. Interpret the results. Answer: Page A38 
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According to the United States 
Geological Survey, the mean 
magnitude of worldwide 
earthquakes in a recent year 

was about 3.87. The magnitude 
of worldwide earthquakes can 

be approximated by a normal 
distribution. Assume the standard 
deviation is 0.81. (Adapted from United 
States Geological Survey) 


Worldwide Earthquakes 
in 2008 


= 3.87 


Magnitude 


Between what two values does 
the middle 90% of the data lie? 


STUDY TIP 


Here are instructions for finding 
a specific x-value for a given 
probability on a TI-83/84 Plus. 


2nd| DISTR 


3: invNorm( 


Enter the values for the area 
under the normal distribution, 
the specified mean, and the 
specified standard deviation 
separated by commas. 


ENTER 


inviornm . 3, ob, 18 
62,81551567 


> FINDING A SPECIFIC DATA VALUE FOR A GIVEN 
PROBABILITY 


You can also use the normal distribution to find a specific data value (x-value) 
for a given probability, as shown in Examples 4 and 5. 


EXAMPLE 4 G@ Report 21 


» Finding a Specific Data Value 

Scores for the California Peace Officer Standards and Training test are 
normally distributed, with a mean of 50 and a standard deviation of 10. An 
agency will only hire applicants with scores in the top 10%. What is the lowest 
score you can earn and still be eligible to be hired by the agency? (Source: State 
of California) 


> Solution 


Exam scores in the top 10% correspond to the shaded region shown. 


0 1.28 
t 

50 ? 

Test score 


A test score in the top 10% is any score above the 90th percentile. To find the 
score that represents the 90th percentile, you must first find the z-score that 
corresponds to a cumulative area of 0.9. From the Standard Normal Table, you 
can find that the area closest to 0.9 is 0.8997. So, the z-score that corresponds 
to an area of 0.9 is z = 1.28. Using the equation x = w + zo, you have 


xX=pt Zo 
= 50 + 1.28(10) 
~ 62.8. 


Interpretation The lowest score you can earn and still be eligible to be hired 
by the agency is about 63. 


> Try It Yourself 4 


The braking distances of a sample of Nissan Altimas are normally distributed, 
with a mean of 129 feet and a standard deviation of 5.18 feet. What is the 
longest braking distance one of these Nissan Altimas could have and still be in 
the bottom 1%? (Adapted from Consumer Reports) 


a. Sketch a graph. 

b. Find the z-score that corresponds to the given area. 

c. Find x using the equation x = p+ zo. 

d. Interpret the result. Answer: Page A38 
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EXAMPLE 5 G® Report 22 


> Finding a Specific Data Value 


In a randomly selected sample of women ages 20-34, the mean total 
cholesterol level is 188 milligrams per deciliter with a standard deviation of 
41.3 milligrams per deciliter. Assume the total cholesterol levels are normally 
distributed. Find the highest total cholesterol level a woman in this 20-34 age 
group can have and still be in the bottom 1%. (Adapted from National Center for 
Health Statistics) 


> Solution 
Total cholesterol levels in the lowest 1% correspond to the shaded region 
shown. 
Total Cholesterol Levels in 
Women Ages 20-34 
2.33 0 
~ { >X 
2 188 
Total cholesterol level (in mg/dL) 
rmac.f@is Leet A total cholesterol level in the lowest 1% is any level below the 1st percentile. 
To find the level that represents the 1st percentile, you must first find the 
91. 92193267 z-score that corresponds to a cumulative area of 0.01. From the Standard 
Normal Table, you can find that the area closest to 0.01 is 0.0099. So, the 
z-score that corresponds to an area of 0.01 is z = —2.33. Using the equation 


x = pt za, you have 


X= pt Zo 
Using a TI-83/84 Plus, you can 
find the highest total cholesterol = 188 + (—2.33)(41.3) 
level automatically. ~ 91.77. 


Interpretation The value that separates the lowest 1% of total cholesterol 
levels for women in the 20-34 age group from the highest 99% is about 
92 milligrams per deciliter. 


> Try It Yourself 5 


The lengths of time employees have worked at a corporation are normally 
distributed, with a mean of 11.2 years and a standard deviation of 2.1 years. 
In a company cutback, the lowest 10% in seniority are laid off. What is 
the maximum length of time an employee could have worked and still be 
laid off? 


a. Sketch a graph. 

b. Find the z-score that corresponds to the given area. 

c. Find x using the equation x = pw + zo. 

d. Interpret the result. Answer: Page A38 
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FOR EXTRA HELP; 


7 
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NORMAL PROBABILITY DISTRIBUTIONS 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


In Exercises 1-16, use the Standard Normal Table to find the z-score that 
corresponds to the given cumulative area or percentile. If the area is not in the 
table, use the entry closest to the area. If the area is halfway between two entries, 
use the z-score halfway between the corresponding z-scores. If convenient, use 
technology to find the z-score. 


1. 0.2090 2. 0.4364 3. 0.9916 4. 0.7995 
5. 0.05 6. 0.85 7.0.94 8. 0.0046 
9. Ps 10. Py 11. Pas 12, Ps 
13. Py; 14. Py 15. Py; 16. Px 


Graphical Analysis In Exercises 17-22, find the indicated z-score(s) shown in 
the graph. If convenient, use technology to find the z-score(s). 


17. 18. 
Area = Area = 
0.3520 0.5987 
Zz ae Zz 
20. 
Area = 
0.0233 
: 0 z=? : 


22. 


In Exercises 23-30, find the indicated z-score. 

23. Find the z-score that has 11.9% of the distribution’s area to its left. 
24. Find the z-score that has 78.5% of the distribution’s area to its left. 
25. Find the z-score that has 11.9% of the distribution’s area to its right. 
26. Find the z-score that has 78.5% of the distribution’s area to its right. 


27. Find the z-score for which 80% of the distribution’s area lies between —z 
and z. 


28. Find the z-score for which 99% of the distribution’s area lies between —z 
and z. 
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29. Find the z-score for which 5% of the distribution’s area lies between —z 
and z. 


30. Find the z-score for which 12% of the distribution’s area lies between —z 
and z. 


M@ USING AND INTERPRETING CONCEPTS 


Using Normal Distributions Jn Exercises 31—36, answer the questions about 
the specified normal distribution. 


31. Heights of Women In asurvey of women in the United States (ages 20-29), 
the mean height was 64.3 inches with a standard deviation of 2.6 inches. 
(Adapted from National Center for Health Statistics) 

(a) What height represents the 95th percentile? 
(b) What height represents the first quartile? 

32. Heights of Men Ina survey of men in the United States (ages 20-29), the 
mean height was 69.9 inches with a standard deviation of 3.0 inches. (Adapted 
from National Center for Health Statistics) 

(a) What height represents the 90th percentile? 
(b) What height represents the first quartile? 

33. Heart Transplant Waiting Times The time spent (in days) waiting for a 

heart transplant for people ages 35—49 in a recent year can be approximated 


by a normal distribution, as shown in the graph. (Adapted from Organ 
Procurement and Transplantation Network) 


(a) What waiting time represents the 5th percentile? 
(b) What waiting time represents the third quartile? 


Time Spent Waiting Time Spent Waiting 
for a Heart for a Kidney 


[= 204 days 
o= 25.7 days 


= 1674 days 
o= 212.5 days 


t tt 
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Days Days 
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34. Kidney Transplant Waiting Times The time spent (in days) waiting for a 
kidney transplant for people ages 35—49 in a recent year can be approximated 
by a normal distribution, as shown in the graph. (Adapted from Organ 


Procurement and Transplantation Network) 
Sleeping Times of ae ; 
Medical Residents (a) What waiting time represents the 80th percentile? 


Ain (b) What waiting time represents the first quartile? 
= 6.1 hours 


o= 1.0 hour 35. Sleeping Times of Medical Residents The average time spent sleeping 
(in hours) for a group of medical residents at a hospital can be approximated 
by a normal distribution, as shown in the graph. (Source: National Institute of 
Occupational Safety and Health, Japan) 


3 4 5 6 7 8 9 (a) What is the shortest time spent sleeping that would still place a resident 
Hours in the top 5% of sleeping times? 
FIGURE FOR EXERCISE 35 (b) Between what two values does the middle 50% of the sleep times lie? 
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Annual U.S. per Capita 
Ice Cream Consumption 


T T T T T T 
8 12 16 20 24 28 32 
Consumption (in pounds) 


FIGURE FOR EXERCISE 36 


Final Exam Grades 
40% 


| ei) | ™s x 
D BA 
Points scored on final exam 


FIGURE FOR EXERCISE 42 


36. 
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Ice Cream The annual per capita consumption of ice cream (in pounds) in 
the United States can be approximated by a normal distribution, as shown in 
the graph. (Adapted from U.S. Department of Agriculture) 


(a) What is the largest annual per capita consumption of ice cream that can 
be in the bottom 10% of consumptions? 


(b) Between what two values does the middle 80% of the consumptions lie? 


. Bags of Baby Carrots The weights of bags of baby carrots are normally 


distributed, with a mean of 32 ounces and a standard deviation of 0.36 ounce. 
Bags in the upper 4.5% are too heavy and must be repackaged. What is the 
most a bag of baby carrots can weigh and not need to be repackaged? 


. Vending Machine A vending machine dispenses coffee into an eight-ounce 


cup. The amounts of coffee dispensed into the cup are normally distributed, 
with a standard deviation of 0.03 ounce. You can allow the cup to overfill 1% 
of the time. What amount should you set as the mean amount of coffee to 
be dispensed? 


In Exercises 39 and 40, use the StatCrunch normal calculator to find the 
indicated values. 


39. 


Apples The annual per capita consumption of fresh apples (in pounds) in 
the United States can be approximated by a normal distribution, with a mean 
of 16.2 pounds and a standard deviation of 4 pounds. (Adapted from U.S. 
Department of Agriculture) 


(a) What is the smallest annual per capita consumption of apples that can be 
in the top 25% of consumptions? 


(b) What is the largest annual per capita consumption of apples that can be 
in the bottom 15% of consumptions? 


. Oranges The annual per capita consumption of fresh oranges (in pounds) 


in the United States can be approximated by a normal distribution, with a 
mean of 9.9 pounds and a standard deviation of 2.5 pounds. (Adapted from 
U.S. Department of Agriculture) 


(a) What is the smallest annual per capita consumption of oranges that can 
be in the top 10% of consumptions? 


(b) What is the largest annual per capita consumption of oranges that can be 
in the bottom 5% of consumptions? 


EXTENDING CONCEPTS 


. Writing a Guarantee You sell a brand of automobile tire that has a life 


expectancy that is normally distributed, with a mean life of 30,000 miles and 
a standard deviation of 2500 miles. You want to give a guarantee for free 
replacement of tires that don’t wear well. How should you word your 
guarantee if you are willing to replace approximately 10% of the tires? 


. Statistics Grades In a large section of a statistics class, the points for 


the final exam are normally distributed, with a mean of 72 and a standard 
deviation of 9. Grades are to be assigned according to the following rule: 
the top 10% receive A’s, the next 20% receive B’s, the middle 40% receive 
C’s, the next 20% receive D’s, and the bottom 10% receive F’s. Find the 
lowest score on the final exam that would qualify a student for an A, a B, 
aC,andaD. 
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Birth Weights in America 


The National Center for Health Statistics (NCHS) keeps records of many health-related aspects of 
people, including the birth weights of all babies born in the United States. 
The birth weight of a baby is related to its gestation period (the time between conception and birth). 
For a given gestation period, the birth weights can be approximated by a normal distribution. The means 
and standard deviations of the birth weights for various gestation periods are shown in the table below. 
One of the many goals of the NCHS is to reduce the percentage of babies born with low birth 
weights. As you can see from the graph below, the problem of low birth weights increased from 1992 


to 2006. 
Gestation | Meanbirth Standard 
period weight deviation 
Under 28 weeks 1.90 Ib 1.22 Ib 
28 to 31 weeks 4.12 lb 1.87 Ib 
32 to 33 weeks 5.14 lb 1.57 Ib 
34 to 36 weeks 6.19 lb 1.29 Ib 
37 to 39 weeks 7.29 |b 1.08 Ib 
40 weeks 7.66 |b 1.04 Ib 
41 weeks 7.75 |b 1.07 Ib 
42 weeks and over 7.57 |b 1.11 Ib 


M™ EXERCISES 


1. The distributions of birth weights for three 
gestation periods are shown. Match the 
curves with the gestation periods. Explain 
your reasoning. 


(a) 


(b) ul 


2 4 6 8 10 
Pounds 


Pounds 


Percent 


2. 


Ge 


Preterm = under 37 weeks 
Low birth weight = under 5.5 pounds 


Percent of preterm births 


Percent of low birth weights 


t+—+—_+—_+—_}—_+—_+-—_ +—_ ++ 4 
1996 1998 2000 2002 2004 
Year 


i ae 
1992 1994 2006 


What percent of the babies born within each 
gestation period have a low birth weight 
(under 5.5 pounds)? Explain your reasoning. 


(a) Under 28 weeks (b) 32 to 33 weeks 
(c) 40 weeks (d) 42 weeks and over 


. Describe the weights of the top 10% of the 


babies born within each gestation period. 
Explain your reasoning. 


(a) Under 28 weeks (b) 34 to 36 weeks 

(c) 41 weeks (d) 42 weeks and over 
For each gestation period, what is the proba- 
bility that a baby will weigh between 6 and 9 
pounds at birth? 


(a) Under 28 weeks (b) 28 to 31 weeks 
(c) 34 to 36 weeks (d) 37 to 39 weeks 


. A birth weight of less than 3.25 pounds is 


classified by the NCHS as a “very low birth 
weight.” What is the probability that a baby 
has a very low birth weight for each gestation 
period? 


(a) Under 28 weeks (b) 28 to 31 weeks 
(c) 32 to33 weeks (d) 37 to 39 weeks 
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WHAT YOU SHOULD LEARN 


» How to find sampling 
distributions and verify their 
properties 


>» How to interpret the Central 
Limit Theorem 


» How to apply the Central Limit 
Theorem to find the probability 
of a sample mean 


INSIGHT 


Sample means can vary 
from one another and 
can also vary from the 
population mean. This 
type of variation is to 
be expected and is 
called sampling error. 


Sampling Distributions and the Central Limit Theorem 


Sampling Distributions >» The Central Limit Theorem > Probability and 
the Central Limit Theorem 


> SAMPLING DISTRIBUTIONS 


In previous sections, you studied the relationship between the mean of a 
population and values of a random variable. In this section, you will study 
the relationship between a population mean and the means of samples taken 
from the population. 


DEFINITION 


A sampling distribution is the probability distribution of a sample statistic 
that is formed when samples of size n are repeatedly taken from a 
population. If the sample statistic is the sample mean, then the distribution is 
the sampling distribution of sample means. Every sample statistic has a 
sampling distribution. 


For instance, consider the following Venn diagram. The rectangle represents 
a large population, and each circle represents a sample of size n. Because the 
sample entries can differ, the sample means can also differ. The mean of Sample 1 
is X;; the mean of Sample 2 is X,; and so on. The sampling distribution of the 
sample means for samples of size n for this population consists of X,, X2, X3, and 
so on. If the samples are drawn with replacement, an infinite number of samples 
can be drawn from the population. 


Population with Ul, o 


Sample 4, x, 


Sample 2, x, 


PROPERTIES OF SAMPLING DISTRIBUTIONS OF 
SAMPLE MEANS 


1. The mean of the sample means jp; is equal to the population mean wp. 


Mz = ph 
2. The standard deviation of the sample means o; is equal to the population 
standard deviation o divided by the square root of the sample size n. 


= = = 
“Vn 


The standard deviation of the sampling distribution of the sample means is 
called the standard error of the mean. 
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EXAMPLE 1 


Probability Histogram 
of Population of x 


P(X) 
A 


qh > 
1 3 4 5 6 


7 


S 
i) 
n 


Probability 


i 
T 
2 
Population values 


Probability Distribution 
of Sample Means 


1/16 = 0.0625 
2/16 = 0.1250 
3/16 = 0.1875 
4/16 = 0.2500 
3/16 = 0.1875 
2/16 = 0.1250 
1/16 = 0.0625 


NYDN fF WN FR 
PNY Ww FF WN FR 


Probability Histogram of 
Sampling Distribution of ¥ 
P(X) 


Probability 


12 3 4 5 6 7 
Sample mean 


~) To explore this topic further, 
~ see Activity 5.4 on page 280. 


STUDY TIP 


Review Section 4.1 
to find the mean and Hi 
standard deviation of a 
probability distribution. 


Presented by: https://jafrilibrary.org 


SAMPLING DISTRIBUTIONS AND THE CENTRAL LIMIT THEOREM 267 


>» A Sampling Distribution of Sample Means 

You write the population values {1,3,5,7} on slips of paper and put them in 
a box. Then you randomly choose two slips of paper, with replacement. List all 
possible samples of size n = 2 and calculate the mean of each. These means 
form the sampling distribution of the sample means. Find the mean, variance, 
and standard deviation of the sample means. Compare your results with the 
mean pw = 4, variance 0” = 5, and standard deviation o = V5 = 2.236 of 
the population. 


> Solution 
List all 16 samples of size 2 from the population and the mean of each sample. 


1,1 


1 5,1 3 
1,3 2 5,3 4 
1,5 3 352 5 
Vi 4 S37 6 
351i 2 7,1 4 
3,3 3 7,3 5 
i) 4 Ts 6 
3,7 5 7,7 7 


After constructing a probability distribution of the sample means, you can 
graph the sampling distribution using a probability histogram as shown at the 
left. Notice that the shape of the histogram is bell-shaped and symmetric, 
similar to a normal curve. The mean, variance, and standard deviation of the 
16 sample means are 


by = 4 
2_ 35 5 
(ox)° = 77 255 and o; = -7 V2.5 © 1.581. 
These results satisfy the properties of sampling distributions because 


3; =m=4 and o, = 


7 _N5 1.581. 
Jn V2 


> Try It Yourself 1 


List all possible samples of n = 3, with replacement, from the population 
{1, 3, 5, 7}. Calculate the mean, variance, and standard deviation of the 
sample means. Compare these values with the corresponding population 
parameters. 


a. Form all possible samples of size 3 and find the mean of each. 

b. Make a probability distribution of the sample means and find the mean, 
variance, and standard deviation. 

Compare the mean, variance, and standard deviation of the sample means 
with those of the population. Answer: Page A38 


° 
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INSIGHT 


The distribution of sample 
means has the same mean as 

the population. But its standard 
deviation is less than the standard 
deviation of the population. This 
tells you that the distribution 

of sample means has the same 
center as the population, 
but it is not as spread out. 


Moreover, the distribution 
of sample means 
becomes less and less 
spread out (tighter 
concentration about 
the mean) as the 
sample size n increases. 
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> THE CENTRAL LIMIT THEOREM 


The Central Limit Theorem forms the foundation for the inferential branch 
of statistics. This theorem describes the relationship between the sampling 
distribution of sample means and the population that the samples are taken from. 
The Central Limit Theorem is an important tool that provides the information 
you'll need to use sample statistics to make inferences about a population mean. 


THE CENTRAL LIMIT THEOREM 


1. If samples of size n, where n = 30, are drawn from any population with a 
mean p and a standard deviation o, then the sampling distribution of 
sample means approximates a normal distribution. The greater the sample 
size, the better the approximation. 


2. If the population itself is normally distributed, then the sampling 
distribution of sample means is normally distributed for any sample size n. 


In either case, the sampling distribution of sample means has a mean equal to 
the population mean. 


iw > fit Mean 

The sampling distribution of sample means has a variance equal to 1/n times 
the variance of the population and a standard deviation equal to the 
population standard deviation divided by the square root of n. 


2 


Oe = Variance 


Oo 
n 
(Or 


Vn 


Ox = Standard deviation 


Recall that the standard deviation of the sampling distribution of the sample 
means, ox, is also called the standard error of the mean. 


1. Any Population Distribution 2. Normal Population Distribution 


Standard ° —— Standard 


deviation 


°——~ deviation 


¥—— Mean 4—— Mean 


Distribution of Sample Means, 
n = 30 


ye 


Standard 
deviation 


=I 


z= H—— Mean 


Distribution of Sample Means 
(any 7) 


yew 


Standard 
deviation 


=| 


H, =H — Mean 
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Ye 


> Interpreting the Central Limit Theorem 


Cellular phone bills for residents of a city have a mean of $63 and a standard 
deviation of $11, as shown in the following graph. Random samples of 
100 cellular phone bills are drawn from this population and the mean of each 
sample is determined. Find the mean and standard error of the mean of the 
sampling distribution. Then sketch a graph of the sampling distribution of 
sample means. (Adapted from JD Power and Associates) 


Distribution for 
All Cellular 
Phone Bills 


41 52 63 74 85 
Individual cellular phone bills (in dollars) 


> Solution 


The mean of the sampling distribution is equal to the population mean, and 
the standard error of the mean is equal to the population standard deviation 
divided by Vn. So, 


o 11 
y=m=63 and o;= = =1.1 
wee Van i100 
Interpretation From the Central Limit Theorem, because the sample size is 


greater than 30, the sampling distribution can be approximated by a normal 
distribution with 4 = $63 and o = $1.10, as shown in the graph below. 


Distribution of 
Sample Means 
with n = 100 


41 52! 63 74 85 
Mean of 100 phone bills (in dollars) 


> Try It Yourself 2 

Suppose random samples of size 64 are drawn from the population in 
Example 2. Find the mean and standard error of the mean of the sampling 
distribution. Sketch a graph of the sampling distribution and compare it with 
the sampling distribution in Example 2. 


a. Find px and ox. 

b. Identify the sample size. If n = 30, sketch a normal curve with mean pz and 
standard deviation co. 

c. Compare the results with those in Example 2. Answer: Page A39 
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Yo 


> Interpreting the Central Limit Theorem 


Suppose the training heart rates of all 20-year-old athletes are normally 
distributed, with a mean of 135 beats per minute and standard deviation of 
In a recent year, there were 18 beats per minute, as shown in the following graph. Random samples of size 
about 4.8 million parents in the 4 are drawn from this population, and the mean of each sample is determined. 
Wnitee) States Wao) (eceiveicnild Find the mean and standard error of the mean of the sampling distribution. 


support payments. The following . ae 
histogram shows the distribution Then sketch a graph of the sampling distribution of sample means. 


of children per custodial parent. 
The mean number of children Distribution of 

was 1.7 and the standard Population Training 
deviation was 0.8. (Adapted from Heart Rates 
U.S. Census Bureau) 


Child Support 
P(x) 110 135 160 
Ae i} Rate (in beats per minute) 
> 044 > Solution 
z 0.34 The mean of the sampling distribution is equal to the population mean, and 
ean the standard error of the mean is equal to the population standard deviation 
aa divided by Vn. So, 
: o 18 . 
1234 5 6 Fi Lz = pw = 135 beats per minute and oy = —= = —= = 9 beats per minute. 
Vn V4 


Number of children 

Interpretation From the Central Limit Theorem, because the population is 
normally distributed, the sampling distribution of the sample means is also 
normally distributed, as shown in the graph below. 


You randomly select 35 parents 
who receive child support and 
ask how many children in their 
custody are receiving child 
support payments. What is the 
probability that the mean of 
the sample is between 1.5 

and 1.9 children? 


Distribution of 
Sample Means 
with n =4 


| 


85 110 135 160 185 


Mean rate (in beats per minute) 


> Try It Yourself 3 


The diameters of fully grown white oak trees are normally distributed, with a 
mean of 3.5 feet and a standard deviation of 0.2 foot, as shown in the graph 
below. Random samples of size 16 are drawn from this population, and the mean 
of each sample is determined. Find the mean and standard error of the mean 
of the sampling distribution. Then sketch a graph of the sampling distribution. 


Distribution of 
Population Diameters 


29 3.1 33 3.5 ou 39 4.1 


Diameter (in feet) 


a. Find Uz and OX. 
b. Sketch a normal curve with mean pz and standard deviation o;. 
Answer: Page A39 
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>» PROBABILITY AND THE CENTRAL LIMIT THEOREM 


In Section 5.2, you learned how to find the probability that a random variable x 
will fall in a given interval of population values. In a similar manner, you can find 
the probability that a sample mean *X will fall in a given interval of the ¥ sampling 
distribution. To transform X to a z-score, you can use the formula 


Distribution of 


bad] 


25.8 


24.6\ 25.0  25.4\ 
24.7 25.5 


Mean time (in minutes) 


z-Score 
Distribution 
of Sample 
Means with 
n=50 


In Example 4, you can use a 


TI-83/84 Plus to find the probability 


automatically once the standard 
error of the mean is calculated. 


Value —- Mean X¥-—pPy Xp 


* ~ ‘Standard error Oo; a/Vn 


EXAMPLE 4 


» Finding Probabilities for Sampling Distributions 


The graph at the right shows 
the lengths of time people 
spend driving each day. You 
randomly select 50 drivers 
ages 15 to 19. What is the 1$-19) 
probability that the mean 
time they spend driving each 


Time behind the wheel 


The average time spent driving each day, by age group: 


day is between 24.7 and “— 
25.5 minutes? Assume that 
o = 1.5 minutes. 25-54 
> Solution 

55-64 


The sample size is greater 
than 30, so you can use the 
Central Limit Theorem to 65+ 
conclude that the distribution 
of sample means is approxi- 
mately normal, with a mean 
and a standard deviation of 


Gln » 
Source: U.S 


Department of 
Transportation 


o 1.5 


0x === 


Vn V/50 


The graph of this distribution is shown at the left with a shaded area between 
24.7 and 25.5 minutes. The z-scores that correspond to sample means of 24.7 
and 25.5 minutes are 


_ 247-25 03 


by = w@=25minutes and ~ 0.21213 minute. 


x ~ —1.41 d 
15/V50 0.21213 - 
2 ee ee 


15/50 0.21213 


So, the probability that the mean time the 50 people spend driving each day is 
between 24.7 and 25.5 minutes is 
P(24.7 < ¥ < 25.5) = P(-1.41 < z < 2.36) 
= P(z < 2.36) — P(z < —1.41) 
= 0.9909 — 0.0793 = 0.9116. 
Interpretation Of the samples of 50 drivers ages 15 to 19, 91.16% will have a 
mean driving time that is between 24.7 and 25.5 minutes, as shown in the graph 


at the left. This implies that, assuming the value of w = 25 is correct, only 
8.84% of such sample means will lie outside the given interval. 
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STUDY TIP 


Before you find probabilities 
for intervals of the 

sample mean x, use the 
Central Limit Theorem to 
determine the mean and 
the standard deviation of 
the sampling distribution 
of the sample means. That 
is, calculate wx and ox. 


Distribution 
of Sample 

Means with / 
n=9 


1 \ = 7540 


— { sh) ae) ~ 
6300 6900 7500 8100 8700 


Mean room and board (in dollars) 


cal 


In Example 5, you can use a 
TI-83/84 Plus to find the probability 
automatically. 
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> Try It Yourself 4 


You randomly select 100 drivers ages 15 to 19 from Example 4. What is the 
probability that the mean time they spend driving each day is between 24.7 
and 25.5 minutes? Use w = 25 and o = 1.5 minutes. 


a. Use the Central Limit Theorem to find wz and o, and sketch the sampling 
distribution of the sample means. 

b. Find the z-scores that correspond to xX = 24.7 minutes and xX = 25.5 
minutes. 

c. Find the cumulative area that corresponds to each z-score and calculate the 
probability. 


d. Interpret the results. Answer: Page A39 


EXAMPLE 5 


» Finding Probabilities for Sampling Distributions 


The mean room and board expense per year at four-year colleges is $7540. You 
randomly select 9 four-year colleges. What is the probability that the mean 
room and board is less than $7800? Assume that the room and board expenses 
are normally distributed with a standard deviation of $1245. (Adapted from 
National Center for Education Statistics) 


> Solution 


Because the population is normally distributed, you can use the Central 
Limit Theorem to conclude that the distribution of sample means is normally 
distributed, with a mean of $7540 and a standard deviation of $415. 


o 1245 
x = mw=7540 and of; 415 
aoe "Van 4/9 


The graph of this distribution is shown at the left. The area to the left of $7800 
is shaded. The z-score that corresponds to $7800 is 


_ 7800 — 7540 _ 260 
1245/V/9 = 415 


So, the probability that the mean room and board expense is less than $7800 is 
P(X < 7800) = P(z < 0.63) 
= 0.7357. 


= 0.63. 


Interpretation So, 73.57% of such samples with n = 9 will have a mean less 
than $7800 and 26.43% of these sample means will lie outside this interval. 


> Try It Yourself 5 


The average sales price of a single-family house in the United States is 
$290,600. You randomly select 12 single-family houses. What is the probability 
that the mean sales price is more than $265,000? Assume that the sales prices 
are normally distributed with a standard deviation of $36,000. (Adapied from 
The U.S. Commerce Department) 


a. Use the Central Limit Theorem to find wz and ox and sketch the sampling 
distribution of the sample means. 

b. Find the z-score that corresponds to ¥ = $265,000. 

c. Find the cumulative area that corresponds to the z-score and calculate the 
probability. 


d. Interpret the results. Answer: Page A39 
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The Central Limit Theorem can also be used to investigate unusual events. 
An unusual event is one that occurs with a probability of less than 5%. 


EXAMPLE 6 


> Finding Probabilities for x and x 


An education finance corporation claims that the average credit card debts 
carried by undergraduates are normally distributed, with a mean of $3173 and 
a standard deviation of $1120. (Adapted from Sallie Mae) 


1. What is the probability that a randomly selected undergraduate, who is a 
credit card holder, has a credit card balance less than $2700? 


2. You randomly select 25 undergraduates who are credit card holders. What 
is the probability that their mean credit card balance is less than $2700? 


3. Compare the probabilities from (1) and (2) and interpret your answer in 
terms of the corporation’s claim. 
> Solution 


1. In this case, you are asked to find the probability associated with a certain 
STUDY TIP value of the random variable x. The z-score that corresponds to x = $2700 is 


To find probabilities for individual 


_x—p_ 2700-3173 | 


members of a population with 0.42. 
a normally distributed random - 1120 
variable x, use the formula So, the probability that the card holder has a balance less than $2700 is 
Pe P(x < 2700) = P(z < -0.42) = 0.3372. 
(ar 


2. Here, you are asked to find the probability associated with a sample mean 
x. The z-score that corresponds to x = $2700 is 


X— Me _ ¥—M@ _ 2700-3173 _ -473 _ 
ox a/Vn 4120/25. 224 


So, the probability that the mean credit card balance of the 25 card holders 
is less than $2700 is 


P(X < 2700) = P(z < —2.11) = 0.0174. 


To find probabilities for 
the mean X of a sample 
size n, use the formula 


- 2A, 

2g = [Vly 

f= 5 
OX 


3. Interpretation Although there is about a 34% chance that an undergraduate 
will have a balance less than $2700, there is only about a 2% chance that the 
mean of a sample of 25 will have a balance less than $2700. Because there 
is only a 2% chance that the mean of a sample of 25 will have a balance less 
than $2700, this is an unusual event. So, it is possible that the corporation’s 
claim that the mean is $3173 is incorrect. 


> Try It Yourself 6 


A consumer price analyst claims that prices for liquid crystal display (LCD) 
computer monitors are normally distributed, with a mean of $190 and a 
standard deviation of $48. (1) What is the probability that a randomly selected 
LCD computer monitor costs less than $200? (2) You randomly select 10 LCD 
computer monitors. What is the probability that their mean cost is less than 
$200? (3) Compare these two probabilities. 


a. Find the z-scores that correspond to x and x. 

b. Use the Standard Normal Table to find the probability associated with each 
Z-SCOre. 

c. Compare the probabilities and interpret your answer. Answer: Page A39 
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EZ) EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


In Exercises 1-4, a population has a mean yw = 150 and a standard deviation 
ao = 25. Find the mean and standard deviation of a sampling distribution of 
FOR EXTRA HELP; sample means with the given sample size n. 


3 Ai t= 30 2. n = 100 
44 50 4. n = 1000 


True or False? Jn Exercises 5-8, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. As the size of a sample increases, the mean of the distribution of sample 
means increases. 


6. As the size of a sample increases, the standard deviation of the distribution of 
sample means increases. 


~I 


. A sampling distribution is normal only if the population is normal. 


io) 


. If the size of a sample is at least 30, you can use z-scores to determine the 
probability that a sample mean falls in a given interval of the sampling 
distribution. 


Graphical Analysis In Exercises 9 and 10, the graph of a population 
distribution is shown with its mean and standard deviation. Assume that a sample 
size of 100 is drawn from each population. Decide which of the graphs labeled 
(a)-(c) would most closely resemble the sampling distribution of the sample means 
for each graph. Explain your reasoning. 


9. The waiting time (in seconds) at a traffic signal during a red light 
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10. The annual snowfall (in feet) for a central New York state county 


P(x) 


“ 
io) 
Nee 


Relative frequency 
Frequency 
Frequency 


x x x 
2 4 6 8 10 2 4 6 8 10 —2 0 2 4 6 8 1012 


Snowfall (in feet) Snowfall (in feet) Snowfall (in feet) 


Verifying Properties of Sampling Distributions Jn Exercises 11 and 12, 
find the mean and standard deviation of the population. List all samples (with 
replacement) of the given size from that population. Find the mean and standard 
deviation of the sampling distribution and compare them with the mean and 
standard deviation of the population. 


11. The number of DVDs rented by each of four families in the past month is 8, 
4,16, and 2. Use a sample size of 3. 


12. Four friends paid the following amounts for their MP3 players: $200, $130, 
$270, and $230. Use a sample size of 2. 


Finding Probabilities =n Exercises 13-16, the population mean and standard 
deviation are given. Find the required probability and determine whether the given 
sample mean would be considered unusual. If convenient, use technology to find 
the probability. 


13. For asample of n = 64, find the probability of a sample mean being less than 
24.3 if w = 24 anda = 1.25. 


14. For a sample of m = 100, find the probability of a sample mean being greater 
than 24.3 if w = 24 and o = 1.25. 


15. For a sample of n = 45, find the probability of a sample mean being greater 
than 551 if w = 550 anda = 3.7. 


16. For asample of n = 36, find the probability of a sample mean being less than 
12,750 or greater than 12,753 if w = 12,750 and o = 1.7. 


M@ USING AND INTERPRETING CONCEPTS 


Using the Central Limit Theorem = In Exercises 17-22, use the Central Limit 
Theorem to find the mean and standard error of the mean of the indicated 
sampling distribution. Then sketch a graph of the sampling distribution. 
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17. Employed Persons The amounts of time employees at a large corporation 
work each day are normally distributed, with a mean of 7.6 hours and a 
standard deviation of 0.35 hour. Random samples of size 12 are drawn from 
the population and the mean of each sample is determined. 


18. Fly Eggs The numbers of eggs female house flies lay during their lifetimes 
are normally distributed, with a mean of 800 eggs and a standard deviation 
of 100 eggs. Random samples of size 15 are drawn from this population and 
the mean of each sample is determined. 


19. Photo Printers The mean price of photo printers on a website is $235 with 
a standard deviation of $62. Random samples of size 20 are drawn from this 
population and the mean of each sample is determined. 


20. Employees’ Ages The mean age of employees at a large corporation is 
47.2 years with a standard deviation of 3.6 years. Random samples of size 36 
are drawn from this population and the mean of each sample is determined. 


21. Fresh Vegetables The per capita consumption of fresh vegetables by 
people in the United States in a recent year was normally distributed, with a 
mean of 188.4 pounds and a standard deviation of 54.5 pounds. Random 
samples of 25 are drawn from this population and the mean of each sample 
is determined. (Adapted from U.S. Department of Agriculture) 


22. Coffee The per capita consumption of coffee by people in the United 
States in a recent year was normally distributed, with a mean of 24.2 gallons 
and a standard deviation of 8.1 gallons. Random samples of 30 are drawn 
from this population and the mean of each sample is determined. (Adapted 
from U.S. Department of Agriculture) 


23. Repeat Exercise 17 for samples of size 24 and 36. What happens to the mean 
and the standard deviation of the distribution of sample means as the size of 
the sample increases? 


24. Repeat Exercise 18 for samples of size 30 and 45. What happens to the mean 
and the standard deviation of the distribution of sample means as the size of 
the sample increases? 


Finding Probabilities In Exercises 25-30, find the probabilities and interpret 
the results. If convenient, use technology to find the probabilities. 


25. Salaries The population mean annual salary for environmental compliance 
specialists is about $63,500. A random sample of 35 specialists is drawn 
from this population. What is the probability that the mean salary of the 
sample is less than $60,000? Assume o = $6100. (Adapted from Salary.com) 


26. Salaries The population mean annual salary for flight attendants is $56,275. 
A random sample of 48 flight attendants is selected from this population. 
What is the probability that the mean annual salary of the sample is less than 
$56,100? Assume o = $1800. (Adapted from Salary.com) 


27. Gas Prices: New England During a certain week the mean price of gasoline 
in the New England region was $2.714 per gallon. A random sample of 32 gas 
stations is drawn from this population. What is the probability that the mean 
price for the sample was between $2.695 and $2.725 that week? Assume 
ao = $0.045. (Adapted from U.S. Energy Information Administration) 


28. Gas Prices: California During a certain week the mean price of gasoline in 
California was $2.999 per gallon. A random sample of 38 gas stations is 
drawn from this population. What is the probability that the mean price for 
the sample was between $3.010 and $3.025 that week? Assume a = $0.049. 
(Adapted from U.S. Energy Information Administration) 
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29 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


Heights of Women The mean height of women in the United States (ages 
20-29) is 64.3 inches. A random sample of 60 women in this age group is 
selected. What is the probability that the mean height for the sample is 
greater than 66 inches? Assume o = 2.6 inches. (Source: National Center for 
Health Statistics) 


Heights of Men The mean height of men in the United States (ages 20-29) 
is 69.9 inches. A random sample of 60 men in this age group is selected. 
What is the probability that the mean height for the sample is greater 
than 70 inches? Assume o = 3.0 inches. (Source: National Center for Health 
Statistics) 


Which Is More Likely? Assume that the heights given in Exercise 29 are 
normally distributed. Are you more likely to randomly select 1 woman with 
a height less than 70 inches or are you more likely to select a sample of 
20 women with a mean height less than 70 inches? Explain. 


Which Is More Likely? Assume that the heights given in Exercise 30 are 
normally distributed. Are you more likely to randomly select 1 man with a 
height less than 65 inches or are you more likely to select a sample of 
15 men with a mean height less than 65 inches? Explain. 


Make a Decision A machine used to fill gallon-sized paint cans is 
regulated so that the amount of paint dispensed has a mean of 128 ounces 
and a standard deviation of 0.20 ounce. You randomly select 40 cans and 
carefully measure the contents. The sample mean of the cans is 127.9 ounces. 
Does the machine need to be reset? Explain your reasoning. 


Make a Decision A machine used to fill half-gallon-sized milk containers 
is regulated so that the amount of milk dispensed has a mean of 64 ounces 
and a standard deviation of 0.11 ounce. You randomly select 40 containers 
and carefully measure the contents. The sample mean of the containers is 
64.05 ounces. Does the machine need to be reset? Explain your reasoning. 


Lumber Cutter Your lumber company has bought a machine that 
automatically cuts lumber. The seller of the machine claims that the 
machine cuts lumber to a mean length of 8 feet (96 inches) with a standard 
deviation of 0.5 inch. Assume the lengths are normally distributed. You 
randomly select 40 boards and find that the mean length is 96.25 inches. 


(a) Assuming the seller’s claim is correct, what is the probability that the 
mean of the sample is 96.25 inches or more? 


(b) Using your answer from part (a), what do you think of the seller’s claim? 


(c) Would it be unusual to have an individual board with a length of 
96.25 inches? Why or why not? 


Ice Cream Carton Weights A manufacturer claims that the mean weight of 
its ice cream cartons is 10 ounces with a standard deviation of 0.5 ounce. 
Assume the weights are normally distributed. You test 25 cartons and find 
their mean weight is 10.21 ounces. 


(a) Assuming the manufacturer’s claim is correct, what is the probability 
that the mean of the sample is 10.21 ounces or more? 


(b) Using your answer from part (a), what do you think of the 
manufacturer’s claim? 


(c) Would it be unusual to have an individual carton with a weight of 
10.21 ounces? Why or why not? 
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37. Life of Tires A manufacturer claims that the life span of its tires is 
50,000 miles. You work for a consumer protection agency and you are testing 
this manufacturer’s tires. Assume the life spans of the tires are normally 
distributed. You select 100 tires at random and test them. The mean life span 
is 49,721 miles. Assume 0 = 800 miles. 


(a) Assuming the manufacturer’s claim is correct, what is the probability 
that the mean of the sample is 49,721 miles or less? 


(b) Using your answer from part (a), what do you think of the manufacturer’s 
claim? 

(c) Would it be unusual to have an individual tire with a life span of 
49,721 miles? Why or why not? 


38. Brake Pads A brake pad manufacturer claims its brake pads will last for 
38,000 miles. You work for a consumer protection agency and you are 
testing this manufacturer’s brake pads. Assume the life spans of the brake 
pads are normally distributed. You randomly select 50 brake pads. In your 
tests, the mean life of the brake pads is 37,650 miles. Assume o = 1000 
miles. 


(a) Assuming the manufacturer’s claim is correct, what is the probability 
that the mean of the sample is 37,650 miles or less? 


(b) Using your answer from part (a), what do you think of the manufacturer’s 
claim? 

(c) Would it be unusual to have an individual brake pad last for 37,650 miles? 
Why or why not? 


M@ EXTENDING CONCEPTS 


39. SAT Scores The mean critical reading SAT score is 501, with a standard 
deviation of 112. A particular high school claims that its students have 
unusually high critical reading SAT scores. A random sample of 50 students 
from this school was selected, and the mean critical reading SAT score was 515. 
Is the high school justified in its claim? Explain. (Source: The College Board) 


40. Machine Calibrations A machine in a manufacturing plant is calibrated to 
produce a bolt that has a mean diameter of 4 inches and a standard deviation 
of 0.5 inch. An engineer takes a random sample of 100 bolts from this 
machine and finds the mean diameter is 4.2 inches. What are some possible 
consequences of these findings? 


Finite Correction Factor The formula for the standard error of the mean 


given in the Central Limit Theorem is based on an assumption that the population has 
infinitely many members. This is the case whenever sampling is done with replacement 
(each member is put back after it is selected), because the sampling process could be 
continued indefinitely. The formula is also valid if the sample size is small in compar- 
ison with the population. However, when sampling is done without replacement and 
the sample size n is more than 5% of the finite population of size N (n/N > 0.05), 
there is a finite number of possible samples. A finite correction factor, 


N-n7 
N-1 
should be used to adjust the standard error. The sampling distribution of the 


sample means will be normal with a mean equal to the population mean, and the 
standard error of the mean will be 
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o N-1n 
oF = Si 
* WnVN-1 


In Exercises 41 and 42, determine if the finite correction factor should be used. 
If so, use it in your calculations when you find the probability. 


41. Gas Prices In a sample of 900 gas stations, the mean price of regular 
gasoline at the pump was $2.702 per gallon and the standard deviation was 
$0.009 per gallon. A random sample of size 55 is drawn from this population. 
What is the probability that the mean price per gallon is less than $2.698? 
(Adapted from U.S. Department of Energy) 


42. Old Faithful In a sample of 500 eruptions of the Old Faithful geyser at 
Yellowstone National Park, the mean duration of the eruptions was 
3.32 minutes and the standard deviation was 1.09 minutes. A random sample 
of size 30 is drawn from this population. What is the probability that the 
mean duration of eruptions is between 2.5 minutes and 4 minutes? (Adapted 
from Yellowstone National Park) 


Sampling Distribution of Sample Proportions The sample mean is not 
the only statistic with a sampling distribution. Every sample statistic, such as the 
sample median, the sample standard deviation, and the sample proportion, has a 
sampling distribution. For a random sample of size n, the sample proportion is the 
number of individuals in the sample with a specified characteristic divided by the 
sample size. The sampling distribution of sample proportions is the distribution 
formed when sample proportions of size n are repeatedly taken from a population 
where the probability of an individual with a specified characteristic is p. 


In Exercises 43-46, suppose three births are randomly selected. There are two 
equally possible outcomes for each birth, a boy (b) or a girl (g). The number of 
boys can equal 0, 1, 2, or 3. These correspond to sample proportions of 0, 1/3, 2/3, 
and 1. 


43. List the eight possible samples that can result from randomly selecting 
three births. For instance, let bbb represent a sample of three boys. Make a 
table that shows each sample, the number of boys in each sample, and the 
proportion of boys in each sample. 


44. Use the table from Exercise 43 to construct the sampling distribution of the 
sample proportion of boys from three births. Graph the sampling distribution 
using a probability histogram. What do you notice about the spread of the 
histogram as compared to the binomial probability distribution for the 
number of boys in each sample? 


45. Let x = 1 represent a boy and x = 0 represent a girl. Using these values, find 
the sample mean for each sample. What do you notice? 


46. Construct a sampling distribution of the sample proportion of boys from four 
births. 


47. Heart Transplants About 77% of all female heart transplant patients 
will survive for at least 3 years. One hundred five female heart transplant 
patients are randomly selected. What is the probability that the sample 
proportion surviving for at least 3 years will be less than 70%? Interpret your 
results. Assume the sampling distribution of sample proportions is a normal 
distribution. The mean of the sample proportion is equal to the population 


proportion p, and the standard deviation is equal to , es (Source: American 
n 


Heart Association) 
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Sampling Distributions 


The sampling distributions applet allows you to investigate sampling distributions by 
AQEBET repeatedly taking samples from a population. The top plot displays the 
distribution of a population. Several options are available for the population 
distribution (Uniform, Bell-shaped, Skewed, Binary, and Custom). When SAMPLE 
is clicked, N random samples of size n will be repeatedly selected from the 
population. The sample statistics specified in the bottom two plots will be updated 
for each sample. If N is set to 1 and is less than or equal to 50, the display will show, 
in an animated fashion, the points selected from the population dropping into the 
second plot and the corresponding summary statistic values dropping into the third 
and fourth plots. Click RESET to stop an animation and clear existing results. 
Summary statistics for each plot are shown in the panel at the left of the plot. 


Population (can be changed with mouse) 
Mean 25 Uniform 
Median 25 IRECCE | 
Std. Dev. 14.4338 


Sample data 


Median 
Std. Dev. OF 


Sample Means 


Mean (¥| 


Sample Medians 


Median (¥| 


Median 
Std. Dev. 


m Explore 


Step 1 Specify a distribution. 

Step 2 Specify values of n and N. 

Step 3 Specify what to display in the bottom two graphs. 
Step 4 Click SAMPLE to generate the sampling distributions. 


= Draw Conclusions 


APPLET 1. Run the simulation using n = 30 and N = 10 fora uniform, a bell-shaped, and a 
skewed distribution. What is the mean of the sampling distribution of the sample 
means for each distribution? For each distribution, is this what you would expect? 


2. Run the simulation using n = 50 and N = 10 for a bell-shaped distribution. 
What is the standard deviation of the sampling distribution of the sample 
means? According to the formula, what should the standard deviation of the 
sampling distribution of the sample means be? Is this what you would expect? 
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Normal Approximations to Binomial Distributions 


WHAT YOU SHOULD LEARN 


>» How to decide when a normal 
distribution can approximate 
a binomial distribution 


Approximating a Binomial Distribution >» Continuity Correction 
>» Approximating Binomial Probabilities 


>» APPROXIMATING A BINOMIAL DISTRIBUTION 


In Section 4.2, you learned how to find binomial probabilities. For instance, if a 
surgical procedure has an 85% chance of success and a doctor performs the 
procedure on 10 patients, it is easy to find the probability of exactly two 
successful surgeries. 

But what if the doctor performs the surgical procedure on 150 patients and 
you want to find the probability of fewer than 100 successful surgeries? To do 
this using the techniques described in Section 4.2, you would have to use the 
binomial formula 100 times and find the sum of the resulting probabilities. This 
approach is not practical, of course. A better approach is to use a normal 
distribution to approximate the binomial distribution. 


NORMAL APPROXIMATION TO A BINOMIAL 
DISTRIBUTION 


If np = 5 and ng = 5, then the binomial random variable x is approximately 
normally distributed, with mean 


>» How to find the continuity 
correction 


>» How to use a normal 
distribution to approximate 
binomial probabilities 


&@=np 
and standard deviation 
ao = Vnpq 


where n is the number of independent trials, p is the probability of success in 
a single trial, and q is the probability of failure in a single trial. 


To see why this result is valid, look at the following binomial distributions for 


STUDY TIP p = 0.25, gq = 1 — 0.25 = 0.75, and n = 4, n = 10, n = 25, and n = 50. Notice 
Properties of a binomial that as n increases, the histogram approaches a normal curve. 
experiment a 
x 
e n independent trials 4 
aa n=10 
° Two possible outcomes: 0.25 p= 23 
success or failure 020 +— nq=75 
e Probability of success 0.154 
is p; probability of 0.10-L 
failure isq = 1-—p ape 
¢ pis constant for aes ee ee a a _ x 
each trial 0 12 3 4 5 67 8 9 10 
P(x) P(x) 
A Ah 
i n=25 0.12 ++ n=50 
np = 6.25 0.10-£ np = 12.5 
iat ng = 18.75 0.08 t nq = Bley 
0.06 + 
0.04 - 
0.02 -- + 
Pt rt 7 GL re 
8 10 12 14 16 18 0 2 4 6 8 1012 14 16 18 20 22 24 
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EXAMPLE 1 


>» Approximating a Binomial Distribution 


Two binomial experiments are listed. Decide whether you can use the normal 
distribution to approximate x, the number of people who reply yes. If you can, 
find the mean and standard deviation. If you cannot, explain why. (Source: 
Opinion Research Corporation) 


1. Sixty-two percent of adults in the United States have an HDTV in their 
home. You randomly select 45 adults in the United States and ask them if 
they have an HDTV in their home. 


2. Twelve percent of adults in the United States who do not have an HDTV 
in their home are planning to purchase one in the next two years. You 
randomly select 30 adults in the United States who do not have an HDTV 
and ask them if they are planning to purchase one in the next two years. 


> Solution 
1. In this binomial experiment, m = 45, p = 0.62, and gq = 0.38. So, 
np = 45(0.62) = 27.9 
and 
ng = 45(0.38) = 17.1. 


Because np and ng are greater than 5, you can use a normal distribution 
with 


b= np = 27.9 


and 
ao = Vapqg = V45:0.62:0.38 © 3.26 


to approximate the distribution of x. 


2. In this binomial experiment, n = 30, p = 0.12, and g = 0.88. So, 
np = 30(0.12) = 3.6 
and 
ng = 30(0.88) = 26.4. 


Because np < 5, you cannot use a normal distribution to approximate the 
distribution of x. 


> Try It Yourself 1 


Consider the following binomial experiment. Decide whether you can use the 
normal distribution to approximate x, the number of people who reply yes. If 
you can, find the mean and standard deviation. If you cannot, explain why. 
(Source: Opinion Research Corporation) 


Five percent of adults in the United States are planning to purchase a 
3D TV in the next two years. You randomly select 125 adults in the 
United States and ask them if they are planning to purchase a 3D TV in 
the next two years. 


a. Identify n, p, and q. 

b. Find the products np and nq. 

c. Decide whether you can use a normal distribution to approximate x. 
d. Find the mean yw and standard deviation o, if appropriate. 


Answer: Page A39 
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To use a continuity 

correction, simply , 
subtract 0.5 from the i 
lowest value and/or ° 
add 0.5 to the highest. 
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>» CONTINUITY CORRECTION 


A binomial distribution is discrete and can be represented by a probability 
histogram. To calculate exact binomial probabilities, you can use the binomial 
formula for each value of x and add the results. Geometrically, this corresponds 
to adding the areas of bars in the probability histogram. Remember that each bar 
has a width of one unit and x is the midpoint of the interval. 

When you use a continuous normal distribution to approximate a binomial 
probability, you need to move 0.5 unit to the left and right of the midpoint to 
include all possible x-values in the interval. When you do this, you are making a 
continuity correction. 


Exact binomial 7 Normal 


probability 


approximation 


/| P(c-0.5<x<c+t+0.5) 
/ | 


Sa ., 


c c-0.5 ¢ c+0.5 


EXAMPLE 2 


» 


Using a Continuity Correction 


Use a continuity correction to convert each of the following binomial intervals 
to a normal distribution interval. 


1 


2. 
3. 
» 
1 


The probability of getting between 270 and 310 successes, inclusive 
The probability of getting at least 158 successes 

The probability of getting fewer than 63 successes 

Solution 


The discrete midpoint values are 270, 271,..., 310. The corresponding 
interval for the continuous normal distribution is 


269.5 < x < 310.5. 


. The discrete midpoint values are 158, 159, 160,.... The corresponding 


interval for the continuous normal distribution is 


x > 157.5. 


. The discrete midpoint values are... , 60, 61, 62. The corresponding interval 


for the continuous normal distribution is 


x < 62.5. 


> Try It Yourself 2 


Use a continuity correction to convert each of the following binomial intervals 
to a normal distribution interval. 


1 
2. 


a. 


b. 


The probability of getting between 57 and 83 successes, inclusive 
The probability of getting at most 54 successes 


List the midpoint values for the binomial probability. 
Use a continuity correction to write the normal distribution interval. 


Answer: Page A39 
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>» APPROXIMATING BINOMIAL PROBABILITIES 


In a survey of U.S. adults, people GUIDELINES 
were asked if there should be a 
nationwide ban on smoking in 


Using a Normal Distribution to Approximate Binomial Probabilities 


all public places. The results of IN WORDS IN SYMBOLS 
the survey are shown in the , ’ heels We . 
following pie chart. (Adapted from 1. Verify that a binomial distribution Specify n, p, and q. 
Rasmussen Reports) applies. 
2. Determine if you can use a normal Is np = 5? 
distribution to approximate x, the Is ng = 5? 
Dene binomial variable. 
believe in 
nationwide | Believe in 3. Find the mean yp and standard deviation f= np 
smoking ban } nationwide o for the distribution. o = Vnpq 
smoking ban : Me 
62% 4. Apply the appropriate continuity Add or subtract 
correction. Shade the corresponding 0.5 from endpoints. 
area under the normal curve. 
5. Find th di =i + 
Deneaheemiccuvevien . Find the corresponding z-score(s). a ee 
true indication of the proportion : a 
of the population who say there 6. Find the probability. Use the Standard 
Normal Table. 


should be a nationwide ban 
on smoking in all public places. 
If you sampled 50 adults at 
random, what is the probability 


that between 25 and 30, EXAMPLE 3 
inclusive, would say there 
should be a nationwide ban on > Approximating a Binomial Probability 


smoking in all public places? : . . ; : 
Sixty-two percent of adults in the United States have an HDTV in their home. 


You randomly select 45 adults in the United States and ask them if they have 
an HDTV in their home. What is the probability that fewer than 20 of them 
respond yes? (Source: Opinion Research Corporation) 


> Solution 

From Example 1, you know that you can use a normal distribution with 
bw = 27.9 and o ~ 3.26 to approximate the binomial distribution. Remember 
to apply the continuity correction for the value of x. In the binomial 
distribution, the possible midpoint values for “fewer than 20” are 


coi AT, 18.19, 


To use a normal distribution, add 0.5 to the right-hand boundary 19 to get 
x = 19.5. The graph at the left shows a normal curve with w = 27.9 and 
o © 3.26 and a shaded area to the left of 19.5. The z-score that corresponds to 


! x = 19.5 is 

; —_ 19.5 — 27.9 

3.26 

x —2.58. 

! x Using the Standard Normal Table, 
Number responding yes P(z < —2.58) = 0.0049. 


Interpretation The probability that fewer than 20 people respond yes is 
approximately 0.0049, or about 0.49%. 
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> Try It Yourself 3 


Five percent of adults in the United States are planning to purchase a 3D TV 
in the next two years. You randomly select 125 adults in the United States and 
ask them if they are planning to purchase a 3D TV in the next two years. What 
is the probability that more than 9 respond yes? (See Try It Yourself 1.) 
(Source: Opinion Research Corporation) 


a. Determine whether you can use a normal distribution to approximate the 
binomial variable (see part (c) of Try It Yourself 1). 

b. Find the mean w and the standard deviation o for the distribution (see part 
(d) of Try It Yourself 1). 

c. Apply a continuity correction to rewrite P(x > 9) and sketch a graph. 

d. Find the corresponding z-score. 

e. Use the Standard Normal Table to find the area to the left of z and calculate 
the probability. Answer: Page A39 


EXAMPLE 4 


>» Approximating a Binomial Probability 


STUDY TIP Fifty-eight percent of adults say that they never wear a helmet when riding a 
In a discrete distribution, there bicycle. You randomly select 200 adults in the United States and ask them if 
is a difference between P(x = c) they wear a helmet when riding a bicycle. What is the probability that at least 
einellg Ce (6) Mats edits 120 adults will say they never wear a helmet when riding a bicycle? (Source: 


See uns eee as) Consumer Reports National Research Center) 
that x is exactly c is not 


0. In a continuous 
distribution, however, 
there is no difference 
between P(x = c) and 
P(x > c) because the 
probability that x is 
exactly c is 0. 


> Solution Because np = 200-0.58 = 116 and nq = 200° 0.42 = 84, the 
binomial variable x is approximately normally distributed, with 


w=np=116 and o = Vapq = V200-0.58+0.42 © 6.98. 


Using the continuity correction, you u= 116 
can rewrite the discrete probability 
P(x = 120) as the continuous probability 
P(x = 119.5). The graph shows a normal 
curve with w = 116, o = 6.98, and a 
shaded area to the right of 119.5. The 
z-score that corresponds to 119.5 is 


— iE I ie x 
119.5 — 116 95 100 105 110 115 120 125 130 135 
= ————— ®& 0.50. Number responding never 


6.98 


So, the probability that at least 120 will say yes is approximately 
P(x = 119.5) = P(z = 0.50) 
= 1-—- P(z = 0.50) = 1 — 0.6915 = 0.3085. 
> Try It Yourself 4 


In Example 4, what is the probability that at most 100 adults will say they 
never wear a helmet when riding a bicycle? 


In Example 4, you can use a 


TI-83/84 Plus to find the probability a. Determine whether you can use a normal distribution to approximate the 

once the mean, standard deviation, binomial variable (see Example 4). 

and continuity correction are b. Find the mean yw and the standard deviation o for the distribution (see 

calculated. Use 10,000 for the Example 4). 

upper bound. c. Apply a continuity correction to rewrite P(x = 100) and sketch a graph. 
d. Find the corresponding z-score. 


e. Use the Standard Normal Table to find the area to the left of z and 
calculate the probability. Answer: Page A39 
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EXAMPLE 5 


>» Approximating a Binomial Probability 


A survey reports that 62% of Internet users use Windows® Internet Explorer® 
as their browser. You randomly select 150 Internet users and ask them whether 
they use Internet Explorer® as their browser. What is the probability that 
exactly 96 will say yes? (Source: Net Applications) 


> Solution 
Because np = 150:0.62 = 93 and ng = 150-0.38 = 57, the binomial 
variable x is approximately normally distributed, with 


w=np=93 and o = Vapg = V150-0.62-0.38 © 5.94. 


Using the continuity correction, you can rewrite the discrete probability 
P(x = 96) as the continuous probability P(95.5 < x < 96.5). The graph shows a 
normal curve with pw = 93, 0 = 5.94, and a shaded area between 95.5 and 96.5. 


| | 
T T T T T T T T 
75 80 85 90 95 100 105 110 


Number responding yes 


binomed? (158, .62 The z-scores that correspond to 95.5 and 96.5 are 
2 
»AS9S828529 _ 9a = 95 = _ 96.5 — 93 7 
Z 5.04 0.42 and Zz 594 0.59. 


So, the probability that exactly 96 Internet users will say they use Internet 
Explorer® is 


See P(95.5 < x < 96.5) = P(0.42 < z < 0.59) 
The approximation in Example 5 


is almost exactly equal to the = P(z < 059) — P(z < 0.42) 
exact probability found using = 0.7224 — 0.6628 

the binompdf( command on a = (0596. 

TI-83/84 Plus. 


Interpretation The probability that exactly 96 of the Internet users will say 
they use Internet Explorer® is approximately 0.0596, or about 6%. 


> Try It Yourself 5 


A survey reports that 24% of Internet users use Mozilla® Firefox® as their 
browser. You randomly select 150 Internet users and ask them whether they 
use Firefox® as their browser. What is the probability that exactly 27 will say 
yes? (Source: Net Applications) 


a. Determine whether you can use a normal distribution to approximate the 
binomial variable. 

. Find the mean pw and the standard deviation o for the distribution. 

. Apply a continuity correction to rewrite P(x = 27) and sketch a graph. 

. Find the corresponding z-scores. 

. Use the Standard Normal Table to find the area to the left of each z-score 
and calculate the probability. Answer: Page A39 


conn & 
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NSD Exercises 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What are the properties of a binomial experiment? 


2. What are the conditions for using a normal distribution to approximate a 
FOR EXTRA HELP; binomial distribution? 


ss Fy : 


In Exercises 3-6, the sample size n, probability of success p, and probability of 
failure q are given for a binomial experiment. Decide whether you can use a 
normal distribution to approximate the random variable x. 


3. n = 24, p = 0.85, q = 0.15 4. n = 15, p = 0.70, q = 0.30 
5. n = 18, p = 0.90, gq = 0.10 6. n = 20, p = 0.65, q = 0.35 


Approximating a Binomial Distribution Jn Exercises 7-12, a binomial 
experiment is given. Decide whether you can use a normal distribution to approx- 
imate the binomial distribution. If you can, find the mean and standard deviation. 
If you cannot, explain why. 


7. House Contract A survey of U.S. adults found that 85% read every word or 
at least enough to understand a contract for buying or selling a home before 
signing. You randomly select 10 adults and ask them if they read every word 
or at least enough to understand a contract for buying or selling a home 
before signing. (Source: FindLaw.com) 


8. Organ Donors A survey of U.S. adults found that 63% would want their 
organs transplanted into a patient who needs them if they were killed in an 
accident. You randomly select 20 adults and ask them if they would want 
their organs transplanted into a patient who needs them if they were killed 
in an accident. (Source: USA Today) 


9. Multivitamins A survey of U.S. adults found that 55% have used a multi- 
vitamin in the past 12 months. You randomly select 50 adults and ask them if 
they have used a multivitamin in the past 12 months. (Source: Harris Interactive) 


10. Happiness at Work A survey of U.S. adults found that 19% are happy with 
their current employer. You randomly select 30 adults and ask them if they 
are happy with their current employer. (Source: Opinion Research Corporation) 


11. Going Green A survey of U.S. adults found that 76% would pay more for 
an environmentally friendly product. You randomly select 20 adults and ask 
them if they would pay more for an environmentally friendly product. 
(Source: Opinion Research Corporation) 


12. Online Habits A survey of U.S. adults found that 61% look online for 
health information. You randomly select 15 adults and ask them if they look 
online for health information. (Source: Pew Research Center) 


In Exercises 13-16, use a continuity correction and match the binomial probability 
statement with the corresponding normal distribution statement. 


Binomial Probability Normal Probability 
13. P(x > 109) (a) P(x > 109.5) 
14. P(x = 109) (b) P(x < 108.5) 
15. P(x = 109) (c) P(x = 109.5) 
16. P(x < 109) (d) P(x = 108.5) 
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In Exercises 17-22, a binomial probability is given. Write the probability in words. 
Then, use a continuity correction to convert the binomial probability to a normal 
distribution probability. 


17. P(x < 25) 18. P(x = 110) 19. P(x = 33) 
20. P(x > 65) 21. P(x < 150) 22. P(55 < x < 60) 


M@ USING AND INTERPRETING CONCEPTS 


Approximating Binomial Probabilities Jn Exercises 23-30, decide whether 
you can use anormal distribution to approximate the binomial distribution. If you 
can, use the normal distribution to approximate the indicated probabilities and 
sketch their graphs. If you cannot, explain why and use a binomial distribution 
to find the indicated probabilities. 


23. Internet Use A survey of U.S. adults ages 18-29 found that 93% use the 
Internet. You randomly select 100 adults ages 18-29 and ask them if they use 
the Internet. (Source: Pew Research Center) 

(a) Find the probability that exactly 90 people say they use the Internet. 
(b) Find the probability that at least 90 people say they use the Internet. 
(c) Find the probability that fewer than 90 people say they use the Internet. 
(d) Are any of the probabilities in parts (a)—(c) unusual? Explain. 

24. Internet Use A survey of U.S. adults ages 50-64 found that 70% use the 
Internet. You randomly select 80 adults ages 50—64 and ask them if they use 
the Internet. (Source: Pew Research Center) 
(a) Find the probability that at least 70 people say they use the Internet. 
(b) Find the probability that exactly 50 people say they use the Internet. 
(c) Find the probability that more than 60 people say they use the Internet. 
(d) Are any of the probabilities in parts (a)—(c) unusual? Explain. 

25. Favorite Sport A survey of U.S. adults found that 35% say their favorite 


sport is professional football. You randomly select 150 adults and ask them 
if their favorite sport is professional football. (Source: Harris Interactive) 


(a) Find the probability that at most 75 people say their favorite sport is 
professional football. 


(b) Find the probability that more than 40 people say their favorite sport is 
professional football. 


(c) Find the probability that between 50 and 60 people, inclusive, say their 
favorite sport is professional football. 


(d) Are any of the probabilities in parts (a)-(c) unusual? Explain. 

26. College Graduates About 34% of workers in the United States are college 
graduates. You randomly select 50 workers and ask them if they are a college 
graduate. (Source: U.S. Bureau of Labor Statistics) 

(a) Find the probability that exactly 12 workers are college graduates. 
(b) Find the probability that more than 23 workers are college graduates. 
(c) Find the probability that at most 18 workers are college graduates. 


committee is looking for 30 working college graduates to volunteer at 
dj) A i is looking for 30 king college grad 1 
a career fair. The committee randomly selects 125 workers. What is the 
probability that there will not be enough college graduates? 
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27. 


28. 


29. 


30. 


Public Transportation Five percent of workers in the United States use 
public transportation to get to work. You randomly select 250 workers and 
ask them if they use public transportation to get to work. (Source: U.S. Census 
Bureau) 


(a) Find the probability that exactly 16 workers will say yes. 
(b) Find the probability that at least 9 workers will say yes. 
(c) Find the probability that fewer than 16 workers will say yes. 


(d) A transit authority offers discount rates to companies that have at least 
30 employees who use public transportation to get to work. There are 
500 employees in a company. What is the probability that the company 
will not get the discount? 


Concert Tickets A survey of US. adults who attend at least one music 
concert a year found that 67% say concert tickets are too expensive. You 
randomly select 12 adults who attend at least one music concert a year and 
ask them if concert tickets are too expensive. (Source: Rasmussen Reports) 


(a) Find the probability that fewer than 4 people say that concert tickets are 
too expensive. 


(b) Find the probability that between 7 and 9 people, inclusive, say that 
concert tickets are too expensive. 


(c) Find the probability that at most 10 people say that concert tickets are 
too expensive. 


(d) Are any of the probabilities in parts (a)-(c) unusual? Explain. 


News A survey of US. adults ages 18-24 found that 34% get no news on 
an average day. You randomly select 200 adults ages 18-24 and ask them if 
they get news on an average day. (Source: Pew Research Center) 


(a) Find the probability that at least 85 people say they get no news on an 
average day. 


(b) Find the probability that fewer than 66 people say they get no news on 
an average day. 


(c) Find the probability that exactly 68 people say they get no news on an 
average day. 


(d) A college English teacher wants students to discuss current events. 
The teacher randomly selects six students from the class. What is the 
probability that none of the students can talk about current events 
because they get no news on an average day. 


Long Work Weeks A survey of U.S. workers found that 2.9% work more 
than 70 hours per week. You randomly select 10 workers in the United States 
and ask them if they work more than 70 hours per week. 


(a) Find the probability that at most 3 people say they work more than 
70 hours per week. 


(b) Find the probability that at least 1 person says he or she works more than 
70 hours per week. 


(c) Find the probability that more than 2 people say they work more than 
70 hours per week. 


(d) A large company is concerned about overworked employees who 
work more than 70 hours per week. The company randomly selects 
50 employees. What is the probability there will be no employee 
working more than 70 hours? 
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Graphical Analysis Jn Exercises 31 and 32, write the binomial probability and 
the normal probability for the shaded region of the graph. Find the value of each 
probability and compare the results. 


31. PO) 32, P@&) 

0.24 + ot —— 

+ n=16 amas n=12 

0.20 0.20 ++ =O 

0.16 5 0.16 -- 

0.12 0.121 

0.08 + 0.08 + 

0.04 5 0.04 —- 

x 


} x TY | 
10 12 14 16 0 2 4 6 8 10 12 


mM EXTENDING CONCEPTS 


Getting Physical Jn Exercises 33 and 34, use the following information. 
The graph shows the results of a survey of adults in the United States ages 33 to 51 
who were asked if they participated in a sport. Seventy percent of adults said they 
regularly participated in at least one sport, and they gave their favorite sport. 


33. You randomly select 250 people in the United States ages 33 to 51 and ask 
them if they regularly participate in at least one sport. You find that 60% say 
no. How likely is this result? Do you think this sample is a good one? Explain 
your reasoning. 


34. You randomly select 300 people in the United States ages 33 to 51 and ask 
them if they regularly participate in at least one sport. Of the 200 who say 
yes, 9% say they participate in hiking. How likely is this result? Do you think 
this sample is a good one? Explain your reasoning. 


Testing a Drug In Exercises 35 and 36, use the following information. A drug 
manufacturer claims that a drug cures a rare skin disease 75% of the time. The 
claim is checked by testing the drug on 100 patients. If at least 70 patients are cured, 
this claim will be accepted. 


35. Find the probability that the claim will be rejected assuming that the 
manufacturer’s claim is true. 


36. Find the probability that the claim will be accepted assuming that the 
actual probability that the drug cures the skin disease is 65%. 
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Uses 


Normal Distributions Normal distributions can be used to describe 

many real-life situations and are widely used in the fields of science, business, 
and psychology. They are the most important probability distributions in 
statistics and can be used to approximate other distributions, such as discrete 
binomial distributions. 

The most incredible application of the normal distribution lies in the 
Central Limit Theorem. This theorem states that no matter what type of 
distribution a population may have, as long as the sample size is at least 30, the 
distribution of sample means will be approximately normal. If the population 
is itself normal, then the distribution of sample means will be normal no 
matter how small the sample is. 

The normal distribution is essential to sampling theory. Sampling theory 
forms the basis of statistical inference, which you will begin to study in the next 
chapter. 


Abuses 


Unusual Events Suppose a population is normally distributed, with a mean 
of 100 and standard deviation of 15. It would not be unusual for an individual 
value taken from this population to be 115 or more. In fact, this will happen 
almost 16% of the time. It would be, however, highly unusual to take random 
samples of 100 values from that population and obtain a sample with a mean 
of 115 or more. Because the population is normally distributed, the mean of 
the sample distribution will be 100, and the standard deviation will be 1.5. 
A sample mean of 115 lies 10 standard deviations above the mean. This would 
be an extremely unusual event. When an event this unusual occurs, it is a good 
idea to question the original claimed value of the mean. 

Although normal distributions are common in many populations, people 
try to make non-normal statistics fit a normal distribution. The statistics used 
for normal distributions are often inappropriate when the distribution is 
obviously non-normal. 


Mi EXERCISES 


1. Is It Unusual? A population is normally distributed, with a mean of 100 
and a standard deviation of 15. Determine if either of the following events 
is unusual. Explain your reasoning. 

a. The mean of a sample of 3 is 115 or more. 


b. The mean of a sample of 20 is 105 or more. 

2. Find the Error The mean age of students at a high school is 16.5, with a 
standard deviation of 0.7. You use the Standard Normal Table to help you 
determine that the probability of selecting one student at random and 


finding his or her age to be more than 17.5 years is about 8%. What is the 
error in this problem? 


3. Give an example of a distribution that might be non-normal. 
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=)) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 5.1 
= How to interpret graphs of normal probability distributions 1,2 1-6 
= How to find areas under the standard normal curve 3-6 7-28 
Section 5.2 
= How to find probabilities for normally distributed variables 1-3 29-38 
Section 5.3 
= How to find a z-score given the area under the normal curve 1,2 39-46 
= How to transform a z-score to an x-value 3 47, 48 
X= wt Zo 
= How to find a specific data value of a normal distribution given the 4,5 49-52 
probability 
Section 5.4 
= How to find sampling distributions and verify their properties 1 53, 54 
= How to interpret the Central Limit Theorem 2,3 55, 56 
by = Mean 
Ox = Va Standard deviation 
= How to apply the Central Limit Theorem to find the probability of a 4-6 57-62 
sample mean 
Section 5.5 
= How to decide when a normal distribution can approximate a binomial 1 63, 64 
distribution 
b= np Mean 


o = Vupq Standard deviation 
= How to find the continuity correction 2 65-68 


= How to use a normal distribution to approximate binomial probabilities 3-5 69, 70 
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PS) REVIEW EXERCISES 


80 90 100 110 120 130 140 
FIGURE FOR EXERCISES 3 AND 4 


x 


M@ SECTION 5.1 


In Exercises I and 2, use the graph to estimate p. and o. 


In Exercises 3 and 4, use the normal curves shown. 


3. Which normal curve has the greatest mean? Explain your reasoning. 


4. Which normal curve has the greatest standard deviation? Explain your 
reasoning. 


In Exercises 5 and 6, use the following information and standard scores to 
investigate observations about a normal population. A batch of 2500 resistors is 
normally distributed, with a mean resistance of 1.5 ohms and a standard deviation 
of 0.08 ohm. Four resistors are randomly selected and tested. Their resistances are 
measured at 1.32, 1.54, 1.66, and 1.78 ohms. 


5. How many standard deviations from the mean are these observations? 


6. Are there any unusual observations? 


In Exercises 7 and 8, find the area of the indicated region under the standard 
normal curve. If convenient, use technology to find the area. 


Ts 8. 


{ Zz 


0 0.46 —2.35 -0.8 0 


In Exercises 9-20, find the indicated area under the standard normal curve. If 
convenient, use technology to find the area. 


9. To the left of z = 0.33 10. To the left of z = —1.95 
11. To the right of z = —0.57 12. To the right of z = 3.22 
13. To the left of z = —2.825 14. To the right of z = 0.015 
15. Between z = —1.64 and the mean 


16. Between z = —1.55 and z = 1.04 

17. Between z = 0.05 and z = 1.71 

18. Between z = —2.68 and z = 2.68 

19. To the left of z = —1.5 and to the right of z = 1.5 
20. To the left of z = 0.64 and to the right of z = 3.415 
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; ta 4 


A Bok: Dp 
FIGURE FOR EXERCISES 21 AND 22 


In Exercises 21 and 22, use the following information. In a recent year, the ACT 
scores for the reading portion of the test were normally distributed, with a mean 
of 21.4 and a standard deviation of 6.2. The test scores of four students selected at 
random are 17, 29, 8, and 23. (Source: ACT, Inc.) 


21. Without converting to z-scores, match the values with the letters A, B, C, and 
D on the given graph. 

22. Find the z-score that corresponds to each value and check your answers in 
Exercise 21. Are any of the values unusual? Explain. 


In Exercises 23-28, find the indicated probabilities. If convenient, use technology 
to find the probability. 


23. P(z < 1.28) 24. P(z > —0.74) 
25. P(-2.15 < z < 1.55) 26. P(0.42 < z < 3.15) 
27. P(z < —2.50 or z > 2.50) 28. P(z <0 or z > 1.68) 


M@ SECTION 5.2 


In Exercises 29-34, assume the random variable x is normally distributed, with 
mean yf. = 74 and standard deviation 0 = 8. Find the indicated probability. 


29. P(x < 84) 30. P(x < 55) 
31. P(x > 80) 32. P(x > 71.6) 
33. P(60 < x < 70) 34. P(72 < x < 82) 


In Exercises 35 and 36, find the indicated probabilities. 


35. A study found that the mean migration distance of the green turtle was 2200 
kilometers and the standard deviation was 625 kilometers. Assuming that 
the distances are normally distributed, find the probability that a randomly 
selected green turtle migrates a distance of 


(a) less than 1900 kilometers. 

(b) between 2000 kilometers and 2500 kilometers. 
(c) greater than 2450 kilometers. 

(Adapted from Dorling Kindersley Visual Encyclopedia) 


36. The world’s smallest mammal is the Kitti’s hog-nosed bat, with a mean 
weight of 1.5 grams and a standard deviation of 0.25 gram. Assuming that the 
weights are normally distributed, find the probability of randomly selecting 
a bat that weighs 


(a) between 1.0 gram and 2.0 grams. 

(b) between 1.6 grams and 2.2 grams. 

(c) more than 2.2 grams. 

(Adapted from Dorling Kindersley Visual Encyclopedia) 


37. Can any of the events in Exercise 35 be considered unusual? Explain your 
reasoning. 


38. Can any of the events in Exercise 36 be considered unusual? Explain your 
reasoning. 


M@ SECTION 5.3 


In Exercises 39-44, use the Standard Normal Table to find the z-score that 
corresponds to the given cumulative area or percentile. If the area is not in the 
table, use the entry closest to the area. If convenient, use technology to find the 
Z-SCOre. 
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Braking distance (in meters) 


FIGURE FOR EXERCISES 47-52 
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39. 0.4721 40. 0.1 41. 0.8708 
42. P, 43. Pgs 44. Pr 


45. Find the z-score that has 30.5% of the distribution’s area to its right. 


46. Find the z-score for which 94% of the distribution’s area lies between —z and z. 


In Exercises 47-52, use the following information. On a dry surface, the braking 
distance (in meters) of a Cadillac Escalade can be approximated by a normal 
distribution, as shown in the graph at the left. (Adapted from Consumer Reports) 


47. Find the braking distance of a Cadillac Escalade that corresponds to 
Z= —2.5. 


48. Find the braking distance of a Cadillac Escalade that corresponds to z = 1.2. 
49. What braking distance of a Cadillac Escalade represents the 95th percentile? 
50. What braking distance of a Cadillac Escalade represents the third quartile? 


51. What is the shortest braking distance of a Cadillac Escalade that can be in 
the top 10% of braking distances? 


52. What is the longest braking distance of a Cadillac Escalade that can be in the 
bottom 5% of braking distances? 


M SECTION 5.4 


In Exercises 53 and 54, use the given population to find the mean and standard 
deviation of the population and the mean and standard deviation of the sampling 
distribution. Compare the values. 


53. A corporation has four executives. The number of minutes of overtime per 
week reported by each is 90, 120, 160, and 210. Draw three executives’ names 
from this population, with replacement. 


54. There are four residents sharing a house. The number of times each washes 
their car each month is 1, 2, 0, and 3. Draw two names from this population, 
with replacement. 


In Exercises 55 and 56, use the Central Limit Theorem to find the mean and 
standard error of the mean of the indicated sampling distribution. Then sketch a 
graph of the sampling distribution. 


55. The per capita consumption of citrus fruits by people in the United States in 
a recent year was normally distributed, with a mean of 76.0 pounds and a 
standard deviation of 20.5 pounds. Random samples of 35 people are drawn 
from this population and the mean of each sample is determined. (Adapied 
from U.S. Department of Agriculture) 


56. The per capita consumption of red meat by people in the United States in a 
recent year was normally distributed, with a mean of 108.3 pounds and a 
standard deviation of 35.1 pounds. Random samples of 40 people are drawn 
from this population and the mean of each sample is determined. (Adapted 
from U.S. Department of Agriculture) 


In Exercises 57-62, find the probabilities for the sampling distributions. Interpret 
the results. 


57. Refer to Exercise 35. A sample of 12 green turtles is randomly selected. Find 
the probability that the sample mean of the distance migrated is (a) less than 
1900 kilometers, (b) between 2000 kilometers and 2500 kilometers, and (c) 
greater than 2450 kilometers. Compare your answers with those in Exercise 35. 
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58. Refer to Exercise 36. A sample of seven Kitti’s hog-nosed bats is randomly 
selected. Find the probability that the sample mean is (a) between 1.0 gram 
and 2.0 grams, (b) between 1.6 grams and 2.2 grams, and (c) more than 
2.2 grams. Compare your answers with those in Exercise 36. 


59. The mean annual salary for chauffeurs is $29,200. A sample of 45 chauffeurs 
is randomly selected. What is the probability that the mean annual salary 
is (a) less than $29,000 and (b) more than $31,000? Assume o = $1500. 


(Source: Salary.com) 


60. The mean value of land and buildings per acre for farms is $1300. A sample 
of 36 farms is randomly selected. What is the probability that the mean value 
of land and buildings per acre is (a) less than $1400 and (b) more than $1150? 
Assume o = $250. 


61. The mean price of houses in a city is $1.5 million with a standard deviation 
of $500,000. The house prices are normally distributed. You randomly select 
15 houses in this city. What is the probability that the mean price will be less 
than $1.125 million? 


62. Mean rent in a city is $500 per month with a standard deviation of $30. The 
rents are normally distributed. You randomly select 15 apartments in this 
city. What is the probability that the mean rent will be more than $525? 


M@ SECTION 5.5 


In Exercises 63 and 64, a binomial experiment is given. Decide whether you can use 
anormal distribution to approximate the binomial distribution. If you can, find the 
mean and standard deviation. If you cannot, explain why. 


63. In a recent year, the American Cancer Society said that the five-year survival 
rate for new cases of stage 1 kidney cancer is 96%. You randomly select 
12 men who were new stage 1 kidney cancer cases this year and calculate the 
five-year survival rate of each. (Source: American Cancer Society, Inc.) 


64. A survey indicates that 75% of U.S. adults who go to the theater at least once 
a month think movie tickets are too expensive. You randomly select 30 adults 
and ask them if they think movie tickets are too expensive. (Source: Rasmussen 
Reports) 


In Exercises 65-68, write the binomial probability as a normal probability using 
the continuity correction. 


65. P(x = 25) 66. P(x = 36) 
67. P(x = 45) 68. P(x = 50) 


In Exercises 69 and 70, decide whether you can use a normal distribution to 
approximate the binomial distribution. If you can, use the normal distribution to 
approximate the indicated probabilities and sketch their graphs. If you cannot, 
explain why and use a binomial distribution to find the indicated probabilities. 


69. Seventy percent of children ages 12 to 17 keep at least part of their savings 
in a savings account. You randomly select 45 children and ask them if they 
keep at least part of their savings in a savings account. Find the probability 
that at most 20 children will say yes. (Source: International Communications 
Research for Merrill Lynch) 


70. Thirty-one percent of people in the United States have type A* blood. You 
randomly select 15 people in the United States and ask them if their blood 
type is A*. Find the probability that more than 8 adults say they have A* 
blood. (Source: American Association of Blood Banks) 
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Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


1. 


2. 


Find each standard normal probability. 

(a) P(z > —2.54) 

(b) P(z < 3.09) 

(c) P(—0.88 < z < 0.88) 

(d) P(z < —1.445 or z > —0.715) 

Find each normal probability for the given parameters. 
(a) w = 5.5, 0 = 0.08, P(5.36 < x < 5.64) 

(b) w = —8.2, 0 = 7.84, P(-S.00 < x < 0) 

(c) w = 18.5, 0 = 9.25, P(x < Oorx > 37) 


In Exercises 3-10, use the following information. Students taking a standardized 
IQ test had a mean score of 100 with a standard deviation of 15. Assume that the 
scores are normally distributed. (Adapted from Audiblox) 


3. 


10. 


Find the probability that a student had a score higher than 125. Is this an 
unusual event? Explain. 


. Find the probability that a student had a score between 95 and 105. Is this an 


unusual event? Explain. 


. What percent of the students had an IQ score that is greater than 112? 


. If2000 students are randomly selected, how many would be expected to have 


an IQ score that is less than 90? 


. What is the lowest score that would still place a student in the top 5% of the 


scores? 


. What is the highest score that would still place a student in the bottom 10% 


of the scores? 


. A random sample of 60 students is drawn from this population. What is the 


probability that the mean IQ score is greater than 105? Interpret your result. 


Are you more likely to randomly select one student with an IQ score greater 
than 105 or are you more likely to randomly select a sample of 15 students 
with a mean IQ score greater than 105? Explain. 


In Exercises 11 and 12, use the following information. In a survey of adults under 
age 65, 81% say they are concerned about the amount and security of personal 
online data that can be accessed by cybercriminals and hackers. You randomly 
select 35 adults and ask them if they are concerned about the amount and security 
of personal online data that can be accessed by cybercriminals and hackers. 
(Source: Financial Times/Harris Poll) 


11. 


12. 


Decide whether you can use a normal distribution to approximate the 
binomial distribution. If you can, find the mean and standard deviation. If 
you cannot, explain why. 


Find the probability that at most 20 adults say they are concerned about 
the amount and security of personal online data that can be accessed by 
cybercriminals and hackers. Interpret the result. 
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“ ie - Real Statistics — Real Decisions 


You are the human resources director for a corporation and want to 
implement a health improvement program for employees to decrease 
employee medical absences. You perform a six-month study with a 
random sample of employees. Your goal is to decrease absences by 
50%. (Assume all data are normally distributed.) 


1. Preliminary Thoughts 


You got the idea for this health improvement program from a 
national survey in which 75% of people who responded said they 
would participate in such a program if offered by their employer. 
You randomly select 60 employees and ask them whether they 
would participate in such a program. 


(a) Find the probability that exactly 35 will say yes. 
(b) Find the probability that at least 40 will say yes. 
(c) Find the probability that fewer than 20 will say yes. 


(d) Based on the results in parts (a)-(c), explain why you chose to 
perform the study. 


2. Before the Program 


Before the study, the mean number of absences during a six-month 
period of the participants was 6, with a standard deviation of 1.5. An 
employee is randomly selected. 


(a) Find the probability that the employee’s number of absences is 
less than 5. 


(b) Find the probability that the employee’s number of absences is 
between 5 and 7. 


(c) Find the probability that the employee’s number of absences is 
more than 7. 


3. After the Program 
The graph at the right represents the results of the study. 


(a) What is the mean number of absences for employees? Explain 
how you know. 

(b) Based on the results, was the goal of decreasing absences by 
50% reached? 


(c) Describe how you would present your results to the board of Absences 
directors of the corporation. FIGURE FOR EXERCISE 3 
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TECHNOLOGY MINITAB TI-83/84 PLUS 


aeeN 
) U.S. Census Bureau 


0-4 | 2 6.9% 
Www.census.gov 
5-9 7 6.6% 
AGE DISTRIBUTION IN THE UNITED STATES 10-14 12 6.6% 
— % 
One of the jobs of the U.S. Census Bureau is to keep track of the al = pal 
age distribution in the country. The age distribution in 2009 is a a 6.9% 
shown below. 25-29 27 7.0% 
30-34 32 6.4% 
Age Distribution in the U.S. 
35-39 37 6.9% 
A 
9% —- 40-44 42 71% 
8% 4 45-49 47 7.5% 
3 7% a ——_ i ile 50-54 52 TA% 
meal 55-59 57 6.1% 
2 5% fe) 
« 60-64 62 5.0% 
ge ase 
3 65-69 67 3.7% 
3B 3%+ 
cael 70-74 72 2.9% 
ea 75-79 77 2.4% 
T T T T T T T T T T T T T T T T T T _— a 80-84 82 1.9% 
2 7 12172227 32 37 42.47 52 57 62.67 72 77 82 87 9297 95-89 a7 12% 
Age classes (in years) 
90-94 92 0.5% 
95-99 97 0.2% 
M@ EXERCISES 
We used a technology tool to select random 3. Are the ages of people in the United States 
samples with n = 40 from the age distribution of normally distributed? Explain your reasoning. 


the United States. The means of the 36 samples 4 


. Sketch a relative frequency histogram for the 
were as follows. 


36 sample means. Use nine classes. Is the 


@» 28.14. 31.56. 36.86, 32.37, 36.12, 39.53 histogram approximately bell-shaped and 
i" 36.19. 39.02. 35.62. 36.30, 34.38. 32.98 symmetric? Does this agree with the result 
36.41. 30.24 34.19. 44.72. 38.84, 42.87 predicted by the Central Limit Theorem? 

38.90, 34.71, 34.13, 38.25, 38.04, 34.07, 5. Use a technology tool to find the standard 

39.74, 40.91, 42.63, 35.29, 35.91, 34.36, deviation of the ages of people in the United 
36.51, 36.47, 32.88, 37.33, 31.27, 35.80 States. 

1. Enter the age distribution of the United States 6. Use a technology tool to find the standard 

into a technology tool. Use the tool to find the deviation of the set of 36 sample means. How 

mean age in the United States. does it compare with the standard deviation 


of the ages? Does this agree with the result 


2. Enter the set of sample means into a technolo- predicted by the Central Limit Theorem? 


gy tool. Find the mean of the set of sample 
means. How does it compare with the mean 
age in the United States? Does this agree with 
the result predicted by the Central Limit 
Theorem? 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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1. A survey of voters in the United States found that 15% rate the US. health 
care system as excellent. You randomly select 50 voters and ask them how 
they rate the U.S. health care system. (Source: Rasmussen Reports) 


(a) Verify that the normal distribution can be used to approximate the 
binomial distribution. 


(b) Find the probability that at most 14 voters rate the US. health care 
system as excellent. 

(c) Is it unusual for 14 out of 50 voters to rate the U.S. health care system as 
excellent? Explain your reasoning. 


In Exercises 2 and 3, use the probability distribution to find the (a) mean, 
(b) variance, (c) standard deviation, and (d) expected value of the probability 
distribution, and (e) interpret the results. 


2. The table shows the distribution of family household sizes in the United 
States for a recent year. (Source: U.S. Census Bureau) 
x 2 3 4 5 6 7 
P(x) 0.427 0.227 | 0.200 0.093 | 0.034 | 0.018 


3. The table shows the distribution of fouls per game for a player in a recent 
NBA season. (Source: NBA.com) 


x 0 1 2 3 4 5 6 
P(x) 0.012 | 0.049 0.159 | 0.256 | 0.244 0.195 | 0.085 


4. Use the probability distribution in Exercise 3 to find the probability of 
randomly selecting a game in which the player had (a) fewer than four fouls, 
(b) at least three fouls, and (c) between two and four fouls, inclusive. 


5. From a pool of 16 candidates, 9 men and 7 women, the offices of president, 
vice president, secretary, and treasurer will be filled. (a) In how many 
different ways can the offices be filled? (b) What is the probability that all 
four of the offices are filled by women? 


In Exercises 6-11, use the Standard Normal Table to find the indicated area under 
the standard normal curve. 


6. To the left of z = 0.72 7. To the left of z = —3.08 
8. To the right of z = —0.84 9. Between z = 0 and z = 2.95 
10. Between z = —1.22 and 11. To the left of z = 0.12 or to 
zZ = —0.26 the right of z = 1.72 


12. Forty-five percent of adults say they are interested in regularly measuring 
their carbon footprint. You randomly select 11 adults and ask them if they are 
interested in regularly measuring their carbon footprint. Find the probability 
that the number of adults who say they are interested is (a) exactly eight, (b) 
at least five, and (c) less than two. Are any of these events unusual? Explain 
your reasoning. (Source: Sacred Heart University Polling) 


13. An auto parts seller finds that 1 in every 200 parts sold is defective. Use the 
geometric distribution to find the probability that (a) the first defective part 
is the tenth part sold, (b) the first defective part is the first, second, or third 
part sold, and (c) none of the first 10 parts sold are defective. 


14. The table shows the results of a survey in which 2,944,100 public and 401,900 
private school teachers were asked about their full-time teaching experience. 
(Adapted from U.S. National Center for Education Statistics) 


177,300 27,600 204,900 
995,800 | 154,500 | 1,150,300 
906,300 | 111,600 | 1,017,900 
864,700 | 108,200 972,900 


2,944,100 401,900 3,346,000 


(a) Find the probability that a randomly selected private school teacher has 
10 to 20 years of full-time teaching experience. 


(b) Given that a randomly selected teacher has 3 to 9 years of full-time 
experience, find the probability that the teacher is at a public school. 


(c) Are the events “being a public school teacher” and “having 20 years or 
more of full-time teaching experience” independent? Explain. 


(d) Find the probability that a randomly selected teacher is either at a public 
school or has less than 3 years of full-time teaching experience. 


(e) Find the probability that a randomly selected teacher has 3 to 9 years of 
full-time teaching experience or is at a private school. 


15. The initial pressures for bicycle tires when first filled are normally 
distributed, with a mean of 70 pounds per square inch (psi) and a standard 
deviation of 1.2 psi. 


(a) Random samples of size 40 are drawn from this population and the mean 
of each sample is determined. Use the Central Limit Theorem to find the 
mean and standard error of the mean of the sampling distribution. Then 
sketch a graph of the sampling distribution of sample means. 


(b) A random sample of 15 tires is drawn from this population. What is the 
probability that the mean tire pressure of the sample X is less than 69 psi? 


16. The life spans of car batteries are normally distributed, with a mean of 
44 months and a standard deviation of 5 months. 


(a) Acar battery is selected at random. Find the probability that the life span 
of the battery is less than 36 months. 


(b) A car battery is selected at random. Find the probability that the life span 
of the battery is between 42 and 60 months. 


(c) What is the shortest life expectancy a car battery can have and still be in 
the top 5% of life expectancies? 


17. A florist has 12 different flowers from which floral arrangements can be 
made. (a) If a centerpiece is to be made using four different flowers, how 
many different centerpieces can be made? (b) What is the probability that 
the four flowers in the centerpiece are roses, gerbers, hydrangeas, and callas? 


18. About fifty percent of adults say they feel vulnerable to identity theft. You 
randomly select 16 adults and ask them if they feel vulnerable to identity 
theft. Find the probability that the number who say they feel vulnerable is (a) 
exactly 12, (b) no more than 6, and (c) more than 7. Are any of these events 
unusual? Explain your reasoning. (Adapted from KRC Research for Fellowes) 
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CONFIDENCE 
INTERVALS 


6.1 Confidence Intervals 
for the Mean (Large 
Samples) 

@ CASE STUDY 

6.2 Confidence Intervals 
for the Mean (Small 
Samples) 

@ ACTIVITY 

6.3 Confidence Intervals for 
Population Proportions 
@ ACTIVITY 

6.4 Confidence Intervals 
for Variance and 
Standard Deviation 


m USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


m@ TECHNOLOGY 


David Wechsler was one of the most influential 
psychologists of the 20th century. He is known 
for developing intelligence tests, such as the 
Wechsler Adult Intelligence Scale and the 
Wechsler Intelligence Scale for Children. 
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«€ WHERE YOU'VE BEEN 


In Chapters 1 through 5, you studied descriptive 
statistics (how to collect and describe data) and 
probability (how to find probabilities and 
analyze discrete and continuous probability 
distributions). For instance, psychologists use 
descriptive statistics to analyze the data 
collected during experiments and trials. 


WHERE YOU’RE GOING p> 


In this chapter, you will begin your study of 
inferential statistics—the second major branch 
of statistics. For instance, a chess club wants to 
estimate the mean IQ of its members. The mean 
of a random sample of members is 115. Because 
this estimate consists of a single number repre- 
sented by a point on a number line, it is called a 
point estimate. The problem with using a point 
estimate is that it is rarely equal to the exact 
parameter (mean, standard deviation, or 
proportion) of the population. 


One of the most commonly administered 
psychological tests is the Wechsler Adult 
Intelligence Scale. It is an intelligence quotient 
(IQ) test that is standardized to have a normal 
distribution with a mean of 100 and a standard 
deviation of 15. 


In this chapter, you will learn how to make a 
more meaningful estimate by specifying an 
interval of values on a number line, together with 
a statement of how confident you are that your 
interval contains the population parameter. 
Suppose the club wants to be 90% confident of 
its estimate for the mean IQ of its members. 
Here is an overview of how to construct an 
interval estimate. 


Find the mean of 


Find the margin 


Find the interval endpoints. 


a random sample. = of error. i Left: 115 -— 3.3 = 111.7 
x= 115 E=3.3 Right: 115 + 3.3 = 118.3 
Form the interval estimate. Y 
111.7< w< 1183 
111.7 NE 118.3 
i i} SMG) 
Ag a5 


So, the club can be 90% confident that the mean IQ of its members is between 111.7 and 118.3. 
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WHAT YOU SHOULD LEARN 


CHAPTER 6 
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CONFIDENCE INTERVALS 


> How to find a point estimate 
and a margin of error 


>» How to construct and interpret 


confidence intervals for the 
population mean 


» How to determine the 


minimum sample size required 


when estimating pw 


Sample Data 


Confidence Intervals for the Mean (Large Samples) 


Estimating Population Parameters >» Confidence Intervals for the 
Population Mean > Sample Size 


> ESTIMATING POPULATION PARAMETERS 


In this chapter, you will learn an important technique of statistical inference—to 
use sample statistics to estimate the value of an unknown population parameter. 
In this section, you will learn how to use sample statistics to make an estimate of 
the population parameter 4 when the sample size is at least 30 or when the 
population is normally distributed and the standard deviation o is known. To 
make such an inference, begin by finding a point estimate. 


DEFINITION 


A point estimate is a single value estimate for a population parameter. The 
most unbiased point estimate of the population mean wp is the sample mean x. 


The validity of an estimation method is increased if a sample statistic is unbiased 
and has low variability. A statistic is unbiased if it does not overestimate or 
underestimate the population parameter. In Chapter 5, you learned that the 
mean of all possible sample means of the same size equals the population mean. 
As a result, x is an unbiased estimator of 4. When the standard error 0/ Vn of a 
sample mean is decreased by increasing n, it becomes less variable. 


EXAMPLE 1 


> Finding a Point Estimate 


A social networking website allows its users to add friends, send messages, and 
update their personal profiles. The following represents a random sample of 
the number of friends for 40 users of the website. Find a point estimate of the 
population mean yp. (Adapted from Facebook) 


140 105 130 97 80 165 232 110 214 201 122 
98 65 88 154 133 121 82 130 211 153 114 
58 77 51 247 236 109 126 132 125 149 122 
74 59 218 192 90 117 105 


> Solution 

The sample mean of the data is 
oa me 5232 
x = 40 130.8. 


So, the point estimate for the mean number of friends for all users of the 
website is 130.8 friends. 


> Try It Yourself 1 


Another random sample of the number of friends for 30 users of the website 
is shown at the left. Use this sample to find another point estimate for pw. 


a. Find the sample mean. 
b. Estimate the mean number of friends of the population. Answer: Page A39 
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In Example 1, the probability that the population mean is exactly 130.8 is 
virtually zero. So, instead of estimating to be exactly 130.8 using a point 
estimate, you can estimate that pw lies in an interval. This is called making an 
interval estimate. 


DEFINITION 


An interval estimate is an interval, or range of values, used to estimate a 
population parameter. 


Although you can assume that the point estimate in Example 1 is not equal 
to the actual population mean, it is probably close to it. To form an interval 
estimate, use the point estimate as the center of the interval, then add and 
subtract a margin of error. For instance, if the margin of error is 15.7, then an 
interval estimate would be given by 130.8 + 15.7 or 115.1 < uw < 146.5. The 
point estimate and interval estimate are as follows. 


Left endpoint Point estimate Right endpoint 
115.1 x= 130.8 146.5 


115 120 125 130 135 140 145 150 


Interval estimate 


Before finding a margin of error for an interval estimate, you should first 
determine how confident you need to be that your interval estimate contains the 
population mean w. 


DEFINITION 


The level of confidence c is the probability that the interval estimate contains 
STUDY TIP the population parameter. 
In this course, you will usually 
use 90%, 95%, and 99% levels 


of confidence. The following You know from the Central Limit Theorem that when n = 30, the sampling 
z-scores correspond to these distribution of sample means is a normal distribution. The level of confidence c is 
levels of confidence. the area under the standard normal curve between the critical values, —z, and Z,. 
Level of Critical values are values that separate sample statistics that are probable from 
Confidence z, sample statistics that are improbable, or unusual. You can see from the graph that 
90% 1.645 cis the percent of the area under the normal curve between —z, and z,. The area 
95% 1.96 remaining is 1 — c, so the area in each tail is 5(1 — c). For instance, if c = 90%, 
99% 2 SS then 5% of the area lies to the left of —z, = —1.645 and 5% lies to the right of 


Z. = 1.645. 


c = 0.90 Area in blue region 


Area in yellow 
regions 


(1 — c) = 0.05 Area in each tail 


eget Critical value 
a> separating left tail 


_ Critical value 
aa Oe separating right tail 
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Many investors choose mutual 
funds as a way to invest in the 
stock market. The mean annual 
rate of return for mutual funds 
in a recent year was estimated 
by taking a random sample of 
44 mutual funds. The mean 
annual rate of return for the 
sample was 14.73%, with a 
standard deviation of 7.23%. 
(Source: Marketwatch, Inc.) 


Frequency 


3s © if ei By 33 
Rate of return (in percent) 


For a 95% confidence interval, 
what would be the margin of 
error for the population mean 
rate of return? 


STUDY TIP 


Remember that you can calculate 
the sample standard deviation s 
using the formula 

D(x — x)? 


in = 1 


or the shortcut formula 


De = Cea 
s= ; 
in = || 


However, the most convenient 
way to find the sample 


standard deviation is 
to use the 7-Var Stats 


feature of a graphing te 
calculator. 


The difference between the point estimate and the actual parameter value 
is called the sampling error. When jy is estimated, the sampling error is the 
difference x — mw. In most cases, of course, wis unknown, and X varies from 
sample to sample. However, you can calculate a maximum value for the error 
if you know the level of confidence and the sampling distribution. 


DEFINITION 


Given a level of confidence c, the margin of error E (sometimes also called 
the maximum error of estimate or error tolerance) is the greatest possible 
distance between the point estimate and the value of the parameter it is 
estimating. 


Oo 
E = Z,.0% = oA 
In order to use this technique, it is assumed that the population standard 
deviation is known. This is rarely the case, but when n = 30, the sample 
standard deviation s can be used in place of o. 


EXAMPLE 2 


> Finding the Margin of Error 


Use the data given in Example 1 and a 95% confidence level to find the 
margin of error for the mean number of friends for all users of the website. 
Assume that the sample standard deviation is about 53.0. 


> Solution 


The z-score that corresponds to a 95% confidence level is 1.96. This 
implies that 95% of the area under the standard normal curve falls within 
1.96 standard deviations of the mean. (You can approximate the distribution 
of the sample means with a normal curve by the Central Limit Theorem 
because n = 40 = 30.) You don’t know the population standard deviation o. 
But because n = 30, you can use s in place of o. 


Using the values z,. = 1.96, 0.95 


o = s © 53.0, and n = 40, 


oO 
OF eq 
53.0 
1.96 -—— 0.025 0.025 
V40 


2 


= 16.4. 


—z, = —1.96 z=0 Z. = 1.96 


Interpretation You are 95% confident that the margin of error for the 
population mean is about 16.4 friends. 


> Try It Yourself 2 


Use the data given in Try It Yourself 1 and a 95% confidence level to find 
the margin of error for the mean number of friends for all users of the website. 


a. Identify z., n, and s. 
b. Find E using z., 0 © s, and n. 
c. Interpret the results. Answer: Page A39 
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STUDY TIP 


When you compute a 
confidence interval for 
a population mean, the 
general round-off rule 

is to round off to the 
same number of decimal 
places given for the 
sample mean. Recall 
that rounding is done 

in the final step. 


STUDY TIP 


Other ways to represent a 
confidence interval are 
(Be = 15, Sse 13) and x + E. 
For instance, in Example 3, 
you could write the 
confidence interval 
as (114.4, 147.2) 
or 130.8 + 16.4. 
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>» CONFIDENCE INTERVALS FOR THE 
POPULATION MEAN 


Using a point estimate and a margin of error, you can construct an interval 
estimate of a population parameter such as yw. This interval estimate is called a 
confidence interval. 


DEFINITION 


A c-confidence interval for the population mean yj is 


R= 15 S&S fe S&H ar Jd, 


The probability that the confidence interval contains pu Is c. 


GUIDELINES 


Finding a Confidence Interval for a Population Mean (n = 30 or o known 
with a normally distributed population) 


IN WORDS IN SYMBOLS 
: ae Ed eee: 
1. Find the sample statistics n and x. aes 
2. Specify ao, if known. Otherwise, if n = 30, Sieoz 
find the sample standard deviation s and a er 
use it as an estimate for o. “a 
3. Find the critical value z, that corresponds Use the Standard Normal 
to the given level of confidence. Table or technology. 
4. Find the margin of error E. E= ee 


5. Find the left and right endpoints 
and form the confidence interval. 


Left endpoint: x¥ — E 
Right endpoint: x + E 
Interval: ¥ -E<p<X+E 


SC] Report 23 


> Constructing a Confidence Interval 


Use the data given in Example 1 to construct a 95% confidence interval for the 
mean number of friends for all users of the website. 


EXAMPLE 3 


See MINITAB 
steps on page 352. 


> Solution In Examples 1 and 2, you found that ¥ = 130.8 and E & 16.4. 
The confidence interval is as follows. 


Left Endpoint Right Endpoint 
x-E* 130.8- 164= 1144 x + FE © 130.8 + 16.4 = 147.2 


Cm <p < 1472 ——— 
114.4 1308 147.2 
110 #115 120 125 130 135 140 145 150 


Interpretation With 95% confidence, you can say that the population mean 
number of friends is between 114.4 and 147.2. 
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> Try It Yourself 3 


Use the data given in Try It Yourself 1 to construct a 95% confidence interval 
INSIGHT for the mean number of friends for all users of the website. Compare your 
The width of a confidence result with the interval found in Example 3. 
interval is 2E. Examine eae 

a. Find x and E. 


the formula for E to : : ; : . 
see why a larger b. Find the /eft and right endpoints of the confidence interval. 
c. Interpret the results and compare them with Example 3. Answer: Page A39 


sample size tends to 
EXAMPLE 4 


give you a narrower 
>» Constructing a Confidence Interval Using Technology 


confidence interval 
for the same level 

Use a technology tool to construct a 99% confidence interval for the mean 
number of friends for all users of the website using the sample in Example 1. 


of confidence. 


> Solution 


To use a technology tool to solve the problem, enter the data and recall that 
STUDY TIP the sample standard deviation is s ~ 53.0. Then, use the confidence interval 
command to calculate the confidence interval (J-Sample Z for MINITAB). 
The display should look like the one shown below. To construct a confidence 
interval using a TI-83/84 Plus, follow the instructions in the margin. 


Using a TI-83/84 Plus, you can 
either enter the original data 
into a list to construct the 
confidence interval or enter 


the descriptive statistics. 
MINITAB 
STAT 


One-Sample Z: Friends 


Choose the TESTS menu. 


7: Zinterval... The assumed standard deviation = 53 
Select the Data input option if Variable N Mean StDev SEMean 99% Cl 
you use the original data. Select Friends 40 13080 5263 8.38 (109.21, 152.39) 


the Stats input option if you use 

the descriptive statistics. In each 

case, enter the appropriate 

values, then select Calculate. So, a 99% confidence interval for w is (109.2, 152.4). 
Your results may differ slightly 
depending on the method 
you use. For Example 4, the 
original data values were 
entered. 


Interpretation With 99% confidence, you can say that the population mean 
number of friends is between 109.2 and 152.4. 


> Try It Yourself 4 


Use the sample data in Example 1 and a technology tool to construct 75%, 
85%, and 99% confidence intervals for the mean number of friends for all 
users of the website. How does the width of the confidence interval change as 
the level of confidence increases? 


a. Enter the data. 

b. Use the appropriate command to construct each confidence interval. 

c. Compare the widths of the confidence intervals for c = 0.75, 0.85, and 0.99. 
Answer: Page A39 


In Example 4 and Try It Yourself 4, the same sample data were used to 
construct confidence intervals with different levels of confidence. Notice that as 
the level of confidence increases, the width of the confidence interval also 
increases. In other words, when the same sample data are used, the greater the 
level of confidence, the wider the interval. 
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If the population is normally distributed and the population standard 


deviation o is known, you may use the normal sampling distribution for any 
sample size, as shown in Example 5. 


See TI-83/84 Plus 
STUDY TIP EXAMPLE 5 steps on page 353. 
Here are instructions for construct- 
ing a confidence interval in Excel. 


> Constructing a Confidence Interval, oa Known 


First, click Insert at the top of the A college admissions director wishes to estimate the mean age of all students 
screen and select Function. Select currently enrolled. In a random sample of 20 students, the mean age is found to 
the category Statistical and select be 22.9 years. From past studies, the standard deviation is known to be 1.5 years, 
the Confidence function. In the and the population is normally distributed. Construct a 90% confidence interval 


dialog box, enter the values of 
alpha, the standard deviation, and 
the sample size. Then click OK. > Solution 
The value returned is the margin 
of error, which is used to construct 
the confidence interval. 


of the population mean age. 


Using n = 20, ¥ = 22.9, 0 = 1.5, and z, = 1.645, the margin of error at the 
90% confidence level is 


A B E=zZ . 
4_|=CONFIDENCE(O. 1,1.5,20) vn 
2 0.551 70068 1.5 
= 1.645 -—= #& 0.6. 


V20 


The 90% confidence interval can be written as x + E & 22.9 + 0.6 or as 
follows. 


Alpha is the level of significance, 
which will be explained 
in Chapter 7. When using 
Excel in Chapter 6, you 
can think of alpha as 

the complement of the 
level of confidence. So, 
for a 90% confidence 
interval, alpha is equal 


Left Endpoint Right Endpoint 
x- EE 22.9- 0.6 = 22.33 xX + E © 22.9 + 0.6 = 23.5 


995 a = 985 


to 1 — 0.90 = 0.10. 22.3, 22.9 23.5 
<< 
22,5 23.0 23.5 


Interpretation With 90% confidence, you can say that the mean age of all the 
students is between 22.3 and 23.5 years. 


> Try It Yourself 5 


Construct a 90% confidence interval of the population mean age for the 
college students in Example 5 with the sample size increased to 30 students. 
Compare your answer with Example 5. 


i a. Identify n,X, 0, and z,, and find E. 
— b. Find the /eft and right endpoints of the confidence interval. 


a : 
c. Interpret the results and compare them with Example 5. Answer: Page A39 
1 
—) 
2 
—___¢—___ 
« > After constructing a confidence interval, it is important that you interpret the 


results correctly. Consider the 90% confidence interval constructed in Example 5. 
Because yp is a fixed value predetermined by the population, it is either in the 
interval or not. It is not correct to say “There is a 90% probability that the actual 
mean will be in the interval (22.3, 23.5).” This statement is wrong because it 
suggests that the value of mw can vary, which is not true. The correct way to 


m 


The horizontal segments represent 
90% confidence intervals for 


different samples of the same size. interpret your confidence interval is “If a large number of samples is collected 
In the long run, 9 of every 10 such and a confidence interval is created for each sample, approximately 90% of these 
intervals will contain p. intervals will contain pw.” 
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INSIGHT 


Using the formula for the 
margin of error, 


Oo 
EZ 
Ta 
you can derive the 
minimum sample size n. 
(See Exercise 69.) 


STUDY TIP 


When necessary, round up 
to obtain a whole number 
when determining a 
minimum sample size. 
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> SAMPLE SIZE 


For the same sample statistics, as the level of confidence increases, the confidence 
interval widens. As the confidence interval widens, the precision of the estimate 
decreases. One way to improve the precision of an estimate without decreasing 
the level of confidence is to increase the sample size. But how large a sample size 
is needed to guarantee a certain level of confidence for a given margin of error? 


FIND A MINIMUM SAMPLE SIZE TO ESTIMATE pz 


Given a c-confidence level and a margin of error E, the minimum sample size 
n needed to estimate the population mean p is 


Dae 
a \ eile 
If o is unknown, you can estimate it using s, provided you have a preliminary 
sample with at least 30 members. 


EXAMPLE 6 


> Determining a Minimum Sample Size 

You want to estimate the mean number of friends for all users of the website. 
How many users must be included in the sample if you want to be 95% confident 
that the sample mean is within seven friends of the population mean? 


> Solution 
Using c = 0.95, z. = 1.96, 0 = s © 53.0 (from Example 2), and E = 7, you 
can solve for the minimum sample size n. 


7 cc 
cee eer 7 


7 (226:580)) 
7 


= 220.23 


When necessary, round up to obtain a whole number. So, you should include 
at least 221 users in your sample. 


Interpretation You already have 40, so you need 181 more. Note that 221 is 
the minimum number of users to include in the sample. You could include 
more, if desired. 


> Try It Yourself 6 

How many users must be included in the sample if you want to be 95% 
confident that the sample mean is within 10 users of the population mean? 
Compare your answer with Example 6. 


a. Identify z,, E,and s. 
b. Use z., E,and o ~ s to find the minimum sample size n. 
c. Interpret the results and compare them with Example 6. 
Answer: Page A40 
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PSD Exercises 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. When estimating a population mean, are you more likely to be correct if you 
use a point estimate or an interval estimate? Explain your reasoning. 


FOR EXTRA HELP; 2. A news reporter reports the results of a survey and states that 45% of those 
Be y surveyed responded “yes” with a margin of error of “plus or minus 5%.” 
Explain what this means. 


3. Given the same sample statistics, which level of confidence would produce 
the widest confidence interval? Explain your reasoning. 


(a) 90% (b) 95% (c) 98% (d) 99% 


4. You construct a 95% confidence interval for a population mean using a 
random sample. The confidence interval is 24.9 < jw < 31.5. Is the probability 
that pw is in this interval 0.95? Explain. 


In Exercises 5-8, find the critical value z, necessary to construct a confidence 
interval at the given level of confidence. 


5. c = 0.80 6. c = 0.85 7. c = 0.75 8. c = 0.97 


Graphical Analysis Jn Exercises 9-12, use the values on the number line to 
find the sampling error. 


9. ¥=3.8 pb =4.27 10. p=8.76 ¥=9.5 
x “< e ° > Xx 
3.4 3.6 3.8 4.0 4.2 44 46 8.6 8.8 9.0 9.2 9.4 9.6 9.8 
V1.) w= 24.67 X= 26.43 12, ¥=46.56 u=48.12 
Sa he ~< > 
24 25 26 27 46 P 47 Ay 49 ° 


In Exercises 13-16, find the margin of error for the given values of c, s, and n. 
13. c = 0.95, 5 = 5.2,n = 30 14. c = 0.90, s = 2.9,n = 50 
15. c = 0.80,5 = 1.3,n = 75 16. c = 0.975, s = 4.6, n = 100 


Matching Jn Exercises 17-20, match the level of confidence c with its 
representation on the number line, given x = 57.2, s = 7.1, and n = SO. 


17. c = 0.88 18. c = 0.90 19. c = 0.95 20. c = 0.98 
(a) 54.9 57.2 59.5 (b) 55.2 579 ~—«*59.2 
54.55 56 57 58 59 60 . 54. 55 56 «5700 =58 59S 60 : 
(c) 55.6 577 58.8 (d) 55.5 579 58.9 
54. 55 560 57) 5859 60 : 54.55 56 57) «658 «6559S 60 , 


In Exercises 21-24, construct the indicated confidence interval for the population 
mean w. If convenient, use technology to construct the confidence interval. 


21. c = 0.90, ¥ = 12.3, s = 1.5,n = 50 

22. c = 0.95, ¥ = 31.39, s = 0.8,n = 82 
23. c = 0.99, ¥ = 10.5, 5 = 2.14,n = 45 
24. c = 0.80, ¥ = 20.6, s = 4.7,n = 100 
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In Exercises 25-28, use the given confidence interval to find the margin of error 
and the sample mean. 


25. (12.0, 14.8) 26. (21.61, 30.15) 
27. (1.71, 2.05) 28. (3.144, 3.176) 


In Exercises 29-32, determine the minimum sample size n needed to estimate wu for 
the given values of c, s, and E. 


29. c = 0.90,s = 68,E =1 30. c = 0.95,5 = 2.5,E =1 
31. c = 0.80,5 = 4.1,E =2 32. c = 0.98,5 = 101, E =2 


M@ USING AND INTERPRETING CONCEPTS 


Finding the Margin of Error Jn Exercises 33 and 34, use the given confidence 
interval to find the estimated margin of error. Then find the sample mean. 


33. Commute Times A government agency reports a confidence interval of 
(26.2, 30.1) when estimating the mean commute time (in minutes) for the 
population of workers in a city. 


34. Book Prices A store manager reports a confidence interval of 
(44.07, 80.97) when estimating the mean price (in dollars) for the population 
of textbooks. 


Constructing Confidence Intervals Jn Exercises 35-38, you are given the 
sample mean and the sample standard deviation. Use this information to construct 
the 90% and 95% confidence intervals for the population mean. Interpret the 
results and compare the widths of the confidence intervals. If convenient, use 
technology to construct the confidence intervals. 


35. Home Theater Systems A random sample of 34 home theater systems has 
a mean price of $452.80 and a standard deviation of $85.50. 


36. Gasoline Prices From a random sample of 48 days in a recent year, U.S. 
gasoline prices had a mean of $2.34 and a standard deviation of $0.32. 
(Source: U.S. Energy Information Administration) 


37. Juice Drinks A random sample of 31 eight-ounce servings of different juice 
drinks has a mean of 99.3 calories and a standard deviation of 41.5 calories. 
(Adapted from The Beverage Institute for Health and Wellness) 


38. Sodium Chloride Concentration In 36 randomly selected seawater 
samples, the mean sodium chloride concentration was 23 cubic centimeters 
per cubic meter and the standard deviation was 6.7 cubic centimeters per 
cubic meter. (Adapted from Dorling Kindersley Visual Encyclopedia) 


39. Replacement Costs: Transmissions You work for a consumer advocate 
agency and want to estimate the population mean cost of replacing a car’s 
transmission. As part of your study, you randomly select 50 replacement costs 
and find the mean to be $2650.00. The sample standard deviation is $425.00. 
Construct a 95% confidence interval for the population mean replacement 
cost. Interpret the results. (Adapted from CostHelper) 


40. Repair Costs: Refrigerators In a random sample of 60 refrigerators, the 
mean repair cost was $150.00 and the standard deviation was $15.50. 
Construct a 99% confidence interval for the population mean repair cost. 
Interpret the results. (Adapted from Consumer Reports) 
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41. 


42. 


43. 


Repeat Exercise 39, changing the sample size to n = 80. Which confidence 
interval is wider? Explain. 


Repeat Exercise 40, changing the sample size to n = 40. Which confidence 
interval is wider? Explain. 


Swimming Times A random sample of forty-eight 200-meter swims has a 
mean time of 3.12 minutes and a standard deviation of 0.09 minute. 
Construct a 95% confidence interval for the population mean time. Interpret 
the results. 


. Hotels A random sample of 55 standard hotel rooms in the Philadelphia, 


PA area has a mean nightly cost of $154.17 and a standard deviation of 
$38.60. Construct a 99% confidence interval for the population mean cost. 
Interpret the results. 


. Repeat Exercise 43, using a standard deviation of s = 0.06 minute. Which 


confidence interval is wider? Explain. 


. Repeat Exercise 44, using a standard deviation of s = $42.50. Which 


confidence interval is wider? Explain. 


. If all other quantities remain the same, how does the indicated change affect 


the width of a confidence interval? 


(a) Increase in the level of confidence 
(b) Increase in the sample size 
(c) Increase in the standard deviation 


. Describe how you would construct a 90% confidence interval to estimate the 


population mean age for students at your school. 


Constructing Confidence Intervals Jn Exercises 49 and 50, use the given 
information to construct the 90% and 99% confidence intervals for the population 
mean. Interpret the results and compare the widths of the confidence intervals. 
If convenient, use technology to construct the confidence intervals. 


49. 


DVRs_ A research council wants to estimate the mean length of time 
(in minutes) the average U.S. adult spends watching TVs using digital 
video recorders (DVRs) each day. To determine this estimate, the research 
council takes a random sample of 20 U.S. adults and obtains the following 
results. 


15, 18, 17, 20, 24, 12, 9, 15, 14, 25, 8, 6, 10, 14, 16, 20, 27, 10, 9, 13 


From past studies, the research council assumes that o is 1.3 minutes and that 
the population of times is normally distributed. (Adapted from the Council for 
Research Excellence) 


. Text Messaging A telecommunications company wants to estimate the 


mean length of time (in minutes) that 18- to 24-year-olds spend text 
messaging each day. In a random sample of twenty-seven 18- to 24-year-olds, 
the mean length of time spent text messaging was 29 minutes. From past 
studies, the company assumes that o is 4.5 minutes and that the population 
of times is normally distributed. (Adapted from the Council for Research 
Excellence) 


. Minimum Sample Size Determine the minimum required sample size if 


you want to be 95% confident that the sample mean is within one unit of 
the population mean given o0 = 4.8. Assume the population is normally 
distributed. 
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Error tolerance = 0.25 oz 


FIGURE FOR EXERCISE 55 


Error tolerance = | mL 
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52. 


Minimum Sample Size Determine the minimum required sample size if 
you want to be 99% confident that the sample mean is within two units of 
the population mean given o0 = 1.4. Assume the population is normally 
distributed. 


. Cholesterol Contents of Cheese A cheese processing company wants 


to estimate the mean cholesterol content of all one-ounce servings of cheese. 
The estimate must be within 0.5 milligram of the population mean. 


(a) Determine the minimum required sample size to construct a 95% 
confidence interval for the population mean. Assume the population 
standard deviation is 2.8 milligrams. 


(b) Repeat part (a) using a 99% confidence interval. 
(c) Which level of confidence requires a larger sample size? Explain. 


. Ages of College Students An admissions director wants to estimate the mean 


age of all students enrolled at a college. The estimate must be within 1 year of 
the population mean. Assume the population of ages is normally distributed. 


(a) Determine the minimum required sample size to construct a 90% 
confidence interval for the population mean. Assume the population 
standard deviation is 1.2 years. 


(b) Repeat part (a) using a 99% confidence interval. 
(c) Which level of confidence requires a larger sample size? Explain. 


. Paint Can Volumes A paint manufacturer uses a machine to fill gallon cans 


with paint (see figure). 


(a) The manufacturer wants to estimate the mean volume of paint the 
machine is putting in the cans within 0.25 ounce. Determine the 
minimum sample size required to construct a 90% confidence interval 
for the population mean. Assume the population standard deviation is 
0.85 ounce. 


(b) Repeat part (a) using an error tolerance of 0.15 ounce. Which error 
tolerance requires a larger sample size? Explain. 


. Water Dispensing Machine A beverage company uses a machine to fill 


one-liter bottles with water (see figure). Assume that the population of 
volumes is normally distributed. 


(a) The company wants to estimate the mean volume of water the machine 
is putting in the bottles within 1 milliliter. Determine the minimum 
sample size required to construct a 95% confidence interval for 
the population mean. Assume the population standard deviation is 
3 milliliters. 


(b) Repeat part (a) using an error tolerance of 2 milliliters. Which error 
tolerance requires a larger sample size? Explain. 


. Plastic Sheet Cutting A machine cuts plastic into sheets that are 50 feet 


(600 inches) long. Assume that the population of lengths is normally distributed. 


(a) The company wants to estimate the mean length of the sheets within 
0.125 inch. Determine the minimum sample size required to construct 
a 95% confidence interval for the population mean. Assume the 
population standard deviation is 0.25 inch. 


(b) Repeat part (a) using an error tolerance of 0.0625 inch. Which error 
tolerance requires a larger sample size? Explain. 


Presented by: https://jafrilibrary.org 


18 
19 
20 
21 
22 
23 


Key: 


33 

7 

99 
222333333366 
2222366888889 
88 


18|3 = 183 


FIGURE FOR EXERCISE 63 
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58. 


Paint Sprayer A company uses an automated sprayer to apply paint to 
metal furniture. The company sets the sprayer to apply the paint one mil 
(1/1000 of an inch) thick. 


(a) The company wants to estimate the mean thickness of paint the 
sprayer is applying within 0.0425 mil. Determine the minimum sample 
size required to construct a 90% confidence interval for the population 
mean. Assume the population standard deviation is 0.15 mil. 


(b) Repeat part (a) using an error tolerance of 0.02125 mil. Which error 
tolerance requires a larger sample size? Explain. 


. Soccer Balls A soccer ball manufacturer wants to estimate the mean 


circumference of soccer balls within 0.1 inch. 


(a) Determine the minimum sample size required to construct a 99% 
confidence interval for the population mean. Assume the population 
standard deviation is 0.25 inch. 


(b) Repeat part (a) using a standard deviation of 0.3 inch. Which standard 
deviation requires a larger sample size? Explain. 


. Mini-Soccer Balls A soccer ball manufacturer wants to estimate the mean 


circumference of mini-soccer balls within 0.15 inch. Assume that the 
population of circumferences is normally distributed. 


(a) Determine the minimum sample size required to construct a 99% 
confidence interval for the population mean. Assume the population 
standard deviation is 0.20 inch. 


(b) Repeat part (a) using a standard deviation of 0.10 inch. Which standard 
deviation requires a larger sample size? Explain. 


. If all other quantities remain the same, how does the indicated change affect 


the minimum sample size requirement? 


(a) Increase in the level of confidence 
(b) Increase in the error tolerance 
(c) Increase in the standard deviation 


. When estimating the population mean, why not construct a 99% confidence 


interval every time? 


Using Technology Jn Exercises 63 and 64, you are given a data sample. Use a 
technology tool to construct a 95% confidence interval for the population mean. 
Interpret your answer. 


s 
4 


s 
4 


63. Airfare The stem-and-leaf plot shows the results of a random sample 
of airfare prices (in dollars) for a one-way ticket from Boston, MA to 
Chicago, IL (Adapted from Expedia, Inc.) 


64. Stock Prices A random sample of the closing stock prices for the 
Oracle Corporation for a recent year (Source: Yahoo! Inc.) 


18.41 16.91 16.83 17.72 15.54 15.56 18.01 19.11 19.79 
18.32 18.65 20.71 20.66 21.04 21.74 22.13 21.96 22.16 
22.86 20.86 20.74 22.05 21.42 22.34 22.83 24.34 17.97 
14.47 19.06 18.42 20.85 21.43 21.97 21.81 


Presented by: https://jafrilibrary.org 


316 


CHAPTER 6 


Presented by: https://jafrilibrary.org 


CONFIDENCE INTERVALS 


@@ In Exercises 65 and 66, use StatCrunch to construct the 80%, 90%, and 95% 
confidence intervals for the population mean. Interpret the results. 


65 


. Sodium A random sample of 30 sandwiches from a fast food restaurant has 


a mean of 1042.7 milligrams of sodium and a standard deviation of 344.9 
milligrams of sodium. (Source: McDonald’s Corporation) 


. Carbohydrates The following represents a random sample of the amounts 


of carbohydrates (in grams) for 30 sandwiches from a fast food restaurant. 
(Source: McDonald’s Corporation) 


31 33 34 33 37 40 40 45 37 38 63 61 59 38 40 
44 51 59 52 60 54 62 39 33 26 34 27 35 28 26 


mM EXTENDING CONCEPTS 


Finite Population Correction Factor In Exercises 67 and 68, use the 
following information. 


In this section, you studied the construction of a confidence interval to 
estimate a population mean when the population is large or infinite. When a 
population is finite, the formula that determines the standard error of the 
mean ox needs to be adjusted. If N is the size of the population and n is the 
size of the sample (where n = 0.05N), the standard error of the mean is 


aot. (Nm 
*  VWnVN-1 


The expression V(N — n)/(N — 1) iscalled the finite population correction 
factor. The margin of error is 


O- 


f 2 oe |N- a 
“VWnVN-1 
67. Determine the finite population correction factor for each of the following. 
(a) N = 1000 andn = 500 (b) N = 1000 andn = 100 
(c) N = 1000 and n = 75 (d) N = 1000 and n = 50 


(ec) What happens to the finite population correction factor as the sample 
size n decreases but the population size N remains the same? 


68. Determine the finite population correction factor for each of the following. 


(a) N = 100 andn = 50 (b) N = 400 and n = 50 

(c) N = 700 andn = 50 (d) N = 1000 andn = 50 

(ec) What happens to the finite population correction factor as the 
population size N increases but the sample size n remains the same? 


69. Sample Size The equation for determining the sample size 


Cr) 

ne ( 

E 

can be obtained by solving the equation for the margin of error 


Zo 
Vn 


for n. Show that this is true and justify each step. 


E- 
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Marathon Training 


A marathon is a foot race with a distance of 26.22 miles. It was one of the original events of the 
modern Olympics, where it was a men’s-only event. The women’s marathon did not become an 
Olympic event until 1984. The Olympic record for the men’s marathon was set during the 2008 
Olympics by Samuel Kamau Wanjiru of Kenya, with a time of 2 hours, 6 minutes, 32 seconds. 
The Olympic record for the women’s marathon was set during the 2000 Olympics by Naoko 
Takahashi of Japan, with a time of 2 hours, 23 minutes, 14 seconds. 

Training for a marathon typically lasts at least 6 months. The training is gradual, with increases 
in distance about every 2 weeks. About 1 to 3 weeks before the race, the distance run is decreased 
slightly. The stem-and-leaf plots below show the marathon training times (in minutes) for a sample 
of 30 male runners and 30 female runners. 


Training Times (in minutes) 
of Male Runners 


15 | 58999 Key: 15|5 = 155 
16 | 000012344589 

17 | 0113566779 

18 | 015 


Training Times (in minutes) 
of Female Runners 
17 | 899 Key: 17|8 = 178 
18 | 000012346679 
19 |} 0001345566 
20 | 00123 


M@ EXERCISES 


1. Use the sample to find a point estimate for 5. Use the sample to construct a 95% 
the mean training time of the confidence interval for the population mean 
(a) male runners. training time of all runners. How do your 

results differ from those in Exercise 3? 
(b) female runners. : 
Explain. 


2. Find the standard deviation of the training 


: . A trainer wants to estimate the population 
times for the : meee we oi Per 


mean running times for both male and 
(a) male runners. female runners within 2 minutes. Determine 
(b) female runners. the minimum sample size required to 
construct a 99% confidence interval for 


3. Use the sample to construct a 95% : ea 
the population mean training time of 


confidence interval for the population mean : 
training time of the (a) male runners. Assume the population 


standard deviation is 8.9 minutes. 
(a) male runners. 


(b) female runners. Assume the population 


(b) female runners. standard deviation is 8.4 minutes. 


4. Interpret the results of Exercise 3. 
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WHAT YOU SHOULD LEARN 


» How to interpret the 
t-distribution and use a 
t-distribution table 


» How to construct confidence 
intervals when n < 30, 
the population is normally 
distributed, and o is unknown 


INSIGHT 


The following example illustrates 
the concept of degrees of 
freedom. Suppose the number 
of chairs in a classroom equals 
the number of students: 

25 chairs and 25 students. 
Each of the first 24 students 
to enter the classroom 
has a choice as to which 
chair he or she will sit 
in. There is no freedom 
of choice, however, 

for the 25th student 
who enters the room. 


Presented by: https://jafrilibrary.org 


CONFIDENCE INTERVALS 


Confidence Intervals for the Mean (Small Samples) 


The t-Distribution >» Confidence Intervals and t-Distributions 


> THE t-DISTRIBUTION 


In many real-life situations, the population standard deviation is unknown. 
Moreover, because of various constraints such as time and cost, it is often not 
practical to collect samples of size 30 or more. So, how can you construct a 
confidence interval for a population mean given such circumstances? If the 
random variable is normally distributed (or approximately normally distributed), 
you can use a f-distribution. 


DEFINITION 


If the distribution of a random variable x is approximately normal, then 


Xp 
Ss 


Vn 


follows a f-distribution. 


Critical values of t are denoted by f,. Several properties of the ¢-distribution 
are as follows. 


1 
2. 


AS 


The ¢-distribution is bell-shaped and symmetric about the mean. 


The ¢-distribution is a family of curves, each determined by a parameter 
called the degrees of freedom. The degrees of freedom are the number of 
free choices left after a sample statistic such as xX is calculated. When you 
use a f-distribution to estimate a population mean, the degrees of freedom 
are equal to one less than the sample size. 


df.=n—1 Degrees of freedom 


. The total area under a curve is 1 or 100%. 
. The mean, median, and mode of the f-distribution are equal to 0. 
. As the degrees of freedom increase, the f-distribution approaches the 


normal distribution. After 30 d.f., the f-distribution is very close to the 
standard normal z-distribution. 


The tails in the t-distribution 
are “thicker” than those in the 


I 
I 
I 
I * 5 . 
f standard normal distribution. 


f j 
Standard | 
/7-~normal ! 


Table 5 in Appendix B lists critical values of ¢ for selected confidence 


intervals and degrees of freedom. 
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EXAMPLE 1 


> Finding Critical Values of t 
Find the critical value ft, for a 95% confidence level when the sample size is 15. 
> Solution 
Because n = 15, the degrees of freedom are 
df.=n—-1 
=15-1 
= 14. 


A portion of Table 5 is shown. Using d.f. = 14 and c = 0.95, you can find the 
STUDY TIP critical value t,, as shown by the highlighted areas in the table. 


Unlike the z-table, critical 


values for a specific confidence Level of 
interval can be found in confidence, c 0.50 0.80 0.90 
the column headed by One tail, a 0.25 0.10 0.05 
Cin theca ppropriate df. | Two tails, w 0.50 0.20 0.10 
d.f. row. (The symbol 1 1.000 3.078 6.314 31.821 
will be explained 2 816 1.886 2.920 6.965 
a Chapter ‘ 3 765 1.638 2.353 4.541 
12 695 1.356 2.681 
13 694 1.350 2.650 
: 92 1.34! 1.761 2.624 
691 1.341 1.753 131 2.602 
.690 11,33837/ 1.746 2.120 2.583 
28 .683 1.303 1.701 2.048 2.467 
29 .683 1.311 1.699 2.045 2.462 
© 674 1.282 1.645 1.960 2.326 


From the table, you can see that t, = 2.145. The graph shows the f-distribution 
INSIGHT for 14 degrees of freedom, c = 0.95, and t, = 2.145. 


For 30 or more degrees of 
freedom, the critical values 
for the t-distribution are 
close to the corresponding 
critical values for the 
normal distribution. 
Moreover, the values 

in the last row of the 
table marked oo df. 
correspond exactly to 

the normal distribution =t,=—2.145 t.= 2.145 
values. 


Interpretation So, 95% of the area under the ¢-distribution curve with 
14 degrees of freedom lies between t = +2.145. 


> Try It Yourself 1 
Find the critical value t, for a 90% confidence level when the sample size is 22. 


a. Identify the degrees of freedom. 
b. Identify the level of confidence c. 
c. Use Table 5 in Appendix B to find ¢,. Answer: Page A40 
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STUDY TIP 


For a TI-83/84 Plus, constructing 

a confidence interval using the 
t-distribution is similar to 
constructing a confidence interval 
using the normal distribution. 


STAT) 


Choose the TESTS menu. 
8: Tinterval... 


Select the Data input option if 
you use the original data. Select 
the Stats input option if you use 
the descriptive statistics. In each 
case, enter the appropriate values, 
then select Ca/cu/ate. Your results 
may vary slightly depending 

on the method you use. For 
Example 2, the descriptive 
statistics were entered. 


Tinterval 
£156,67:167.352 


162 
=14 
16 
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> CONFIDENCE INTERVALS AND t-DISTRIBUTIONS 


Constructing a confidence interval using the f-distribution is similar to 
constructing a confidence interval using the normal distribution—both use a 
point estimate X and a margin of error E. 


GUIDELINES 


Constructing a Confidence Interval for the Mean: ¢-Distribution 


IN WORDS IN SYMBOLS 
X(x - x)? 

1. Find the sample statistics n, X= ~ ee s= ( ) 

a n a= Il 

xX, ands. 
2. Identify the degrees of freedom, Chit, = 7 = Il 

the level of confidence c, and 

the critical value ¢,. 
3. Find the margin of error E. Bt = 

n 


Left endpoint: x — E 
Right endpoint: x + E 
Unntervalat ee Get 


4. Find the left and right endpoints 
and form the confidence interval. 


SC Bi 24 
EXAMPLE 2 G9 Report See MINITAB 


steps on page 352. 


>» Constructing a Confidence Interval 


You randomly select 16 coffee shops and measure the temperature of the 
coffee sold at each. The sample mean temperature is 162.0°F with a sample 
standard deviation of 10.0°F. Construct a 95% confidence interval for the 
population mean temperature. Assume the temperatures are approximately 
normally distributed. 


> Solution 

Because the sample size is less than 30, 0 is unknown, and the temperatures 
are approximately normally distributed, you can use the f-distribution. Using 
n = 16, X = 162.0, s = 10.0, c = 0.95, and d.f. = 15, you can use Table 5 to 
find that t, = 2.131. The margin of error at the 95% confidence level is 


10.0 
= 2.131- ~ 5.3. 


AY 
te 
Vn 9/16 


The confidence interval is as follows. 


E= 


Left Endpoint Right Endpoint 
x-E* 162-53 =1567 x +E 162 4+ 5.3 = 167.3 


ae <p < 1673 ——— 


156.7 167.3 
x 162.0 / 


156 158 160 162 164 166 168 


Interpretation With 95% confidence, you can say that the population mean 
temperature of coffee sold is between 156.7°F and 167.3°F. 
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> Try It Yourself 2 


Construct 90% and 99% confidence intervals for the population mean 
temperature. 


a. Find t, and E for each level of confidence. 
b. Use X and F to find the /eft and right endpoints of the confidence interval. 
c. Interpret the results. Answer: Page A40 


EXAMPLE 3 G@® Report 25 


See TI-83/84 Plus 
> Constructing a Confidence Interval Steps Ompagy 203: 


“> To explore this topic further, You randomly select 20 cars of the same model that were sold at a car 
see Activity 6.2 on page 326. dealership and determine the number of days each car sat on the dealership’s 
lot before it was sold. The sample mean is 9.75 days, with a sample standard 
deviation of 2.39 days. Construct a 99% confidence interval for the population 
mean number of days the car model sits on the dealership’s lot. Assume the 
days on the lot are normally distributed. 


> Solution 


Because the sample size is less than 30, 0 is unknown, and the days on the 
lot are normally distributed, you can use the t-distribution. Using n = 20, 
X = 9.75, s = 2.39, c = 0.99, and d.f. = 19, you can use Table 5 to find that 
t, = 2.861. The margin of error at the 99% confidence level is 


E=t—= 


= 1.53. 
The confidence interval is as follows. 


Left Endpoint Right Endpoint 
x - E975 — 153 = 822 x+ E 9.75 + 1.53 = 11.28 


ae << 1128 ——— 


Interpretation With 99% confidence, you can say that the population mean 
number of days the car model sits on the dealership’s lot is between 8.22 and 
11.28. 


> Try It Yourself 3 


Construct 90% and 95% confidence intervals for the population mean number 
of days the car model sits on the dealership’s lot. Compare the widths of the 
confidence intervals. 


a. Find t, and E for each level of confidence. 

b. Use x and F to find the /eft and right endpoints of the confidence interval. 

c. Interpret the results and compare the widths of the confidence intervals. 
Answer: Page A40 
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Two footballs, one filled with air 
and the other filled with helium, 
were kicked on a windless day 

at Ohio State University. The 
footballs were alternated with 
each kick. After 10 practice kicks, 
each football was kicked 29 more 
times. The distances (in yards) are 
listed. (Source: The Columbus Dispatch) 


Air Filled 
9 


O12? 
5511 06 
77788888999 
Like 

34 Key:1|9 = 19 


Helium Filled 
2 
4 


2 
34666 
78889999 
OOOO 1122 

NS 

o Keyl | = 1 


1 
1 
il 
2 
2 
2 
3 
3 
3 


Assume that the distances are 
normally distributed for each 
football. Apply the flowchart 
at the right to each sample. 
Construct a 95% confidence 
interval for the population 
mean distance each football 
traveled. Do the confidence 
intervals overlap? What 

does this result tell you? 
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CONFIDENCE INTERVALS 


The following flowchart describes when to use the normal distribution 


and when to use a ¢-distribution to construct a confidence interval for the 
population mean. 


Use the normal distribution with 


Is n = 30? 


oO 
E=z-% 
n 


gy If o is unknown, use s instead. 


Is the population 


ncnaiallt: ie anenaeniaicl You cannot use the normal 
of ; PP y distribution or the t-distribution. 
normally, distributed? 


: Use the normal distribution with 
oO 
a 
fa Sh 


Is o known? 


Use the f-distribution with 
E = tS 
Jn 


and n — | degrees of freedom. 


EXAMPLE 4 


» Choosing the Normal Distribution or the t-Distribution 


You randomly select 25 newly constructed houses. The sample mean 
construction cost is $181,000 and the population standard deviation is $28,000. 
Assuming construction costs are normally distributed, should you use the 
normal distribution, the f-distribution, or neither to construct a 95% 
confidence interval for the population mean construction cost? Explain your 
reasoning. 


> Solution 


Because the population is normally distributed and the population standard 
deviation is known, you should use the normal distribution. 


> Try It Yourself 4 


You randomly select 18 adult male athletes and measure the resting heart 
rate of each. The sample mean heart rate is 64 beats per minute, with a sample 
standard deviation of 2.5 beats per minute. Assuming the heart rates are 
normally distributed, should you use the normal distribution, the f-distribution, 
or neither to construct a 90% confidence interval for the population mean 
heart rate? Explain your reasoning. 


Use the flowchart above to determine which distribution you should use to 
construct the 90% confidence interval for the population mean heart rate. 


Answer: Page A40 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


In Exercises 1-4, find the critical value t, for the given confidence level c and 
sample size n. 


1. c = 0.90, n = 10 2. c = 0.95, n = 12 
3. c = 0.99, n = 16 4. c = 0.98, n = 20 


In Exercises 5-8, find the margin of error for the given values of c, s, and n. 
5. c= 0.95, s=5, n= 16 6 c= 0.99, s=3, n=6 
7. c = 0.90, s = 2.4, n = 12 8 c = 0.98, s = 4.7, n = 9 


In Exercises 9-12, (a) construct the indicated confidence interval for the 
population mean yp using a t-distribution. (b) If you had incorrectly used a normal 
distribution, which interval would be wider? 


9. c = 0.90, ¥ = 12.5, 5 =2.0,n=6 
10. c = 0.95, ¥ = 13.4, s =085,n=8 
11. c = 0.98, ¥ = 43, 5 = 0.34,n = 14 
12. c = 0.99, ¥ = 24.7, s = 46,n = 10 


=| 


In Exercises 13-16, use the given confidence interval to find the margin of error 
and the sample mean. 


13; (14.7,.22.1) 14. (6.17, 8.53) 
15. (64.6, 83.6) 16. (16.2, 29.8) 


M@ USING AND INTERPRETING CONCEPTS 


Constructing Confidence Intervals In Exercises 17 and 18, you are 
given the sample mean and the sample standard deviation. Assume the random 
variable is normally distributed and use a t-distribution to find the margin of error 
and construct a 95% confidence interval for the population mean. Interpret the 
results. If convenient, use technology to construct the confidence interval. 


17. Commute Time to Work In a random sample of eight people, the mean 
commute time to work was 35.5 minutes and the sample standard deviation 
was 7.2 minutes. 


18. Driving Distance to Work In a random sample of five people, the mean 
driving distance to work was 22.2 miles and the sample standard deviation 
was 5.8 miles. 


19. You research commute times to work and find that the population standard 
deviation was 9.3 minutes. Repeat Exercise 17, using a normal distribution 
with the appropriate calculations for a standard deviation that is known. 
Compare the results. 


20. You research driving distances to work and find that the population standard 
deviation was 5.2 miles. Repeat Exercise 18, using a normal distribution with 
the appropriate calculations for a standard deviation that is known. Compare 
the results. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 
324 CHAPTER 6 CONFIDENCE INTERVALS 


Constructing Confidence Intervals In Exercises 21 and 22, you are given 
the sample mean and the sample standard deviation. Assume the random variable 
is normally distributed and use a normal distribution or a t-distribution to 
construct a 90% confidence interval for the population mean. If convenient, use 
technology to construct the confidence interval. 


21. Waste Generated (a) In a random sample of 10 adults from the United 
States, the mean waste generated per person per day was 4.50 pounds and the 
standard deviation was 1.21 pounds. (b) Repeat part (a), assuming the same 
statistics came from a sample size of 500. Compare the results. (Adapted from 
U.S. Environmental Protection Agency) 


22. Waste Recycled (a) In a random sample of 12 adults from the United 
States, the mean waste recycled per person per day was 1.50 pounds and the 
standard deviation was 0.28 pound. (b) Repeat part (a), assuming the same 
statistics came from a sample size of 600. Compare the results. (Adapted from 
U.S. Environmental Protection Agency) 


Constructing Confidence Intervals Jn Exercises 23-26, a data set is given. 
For each data set, (a) find the sample mean, (b) find the sample standard deviation, 
and (c) construct a 99% confidence interval for the population mean. Assume the 
population of each data set is normally distributed. If convenient, use a technology 
tool. 


23. Earnings The annual earnings of 16 randomly selected computer software 
engineers (Adapted from U.S. Bureau of Labor Statistics) 


92,184 86,919 90,176 91,740 95,535 90,108 94,815 88,114 
85,406 90,197 89,944 93,950 84,116 96,054 85,119 88,549 


24. Earnings The annual earnings of 14 randomly selected physical therapists 
(Adapted from U.S. Bureau of Labor Statistics) 


63,118 65,740 72,899 68,500 66,726 65,554 69,247 
64,963 68,627 70,448 71,842 66,873 74,103 71,138 


25. SAT Scores The SAT scores of 12 randomly selected high school seniors 


1704 1940 1518 2005 1432 1872 
1998 1658 1825 1670 2210 1380 


26. GPA The grade point averages (GPA) of 15 randomly selected college 
students 


23 3.3 2.6 1.8 0.2 3.1 4.0 0.7 
23 2.0 31 34 13 2.6 2.6 


Choosing a Distribution Jn Exercises 27-32, use a normal distribution 
or a t-distribution to construct a 95% confidence interval for the population 
mean. Justify your decision. If neither distribution can be used, explain why. 
Interpret the results. If convenient, use technology to construct the confidence 
interval. 


27. Body Mass Index In a random sample of 50 people, the mean body mass 
index (BMI) was 27.7 and the standard deviation was 6.12. Assume the body 
mass indexes are normally distributed. (Adapted from Centers for Disease 
Control) 


28. Mortgages In a random sample of 15 mortgage institutions, the mean 
interest rate was 4.99% and the standard deviation was 0.36%. Assume the 
interest rates are normally distributed. (Adapted from Federal Reserve) 
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31. 


32. 


29. Sports Cars: Miles per Gallon You take a random survey of 25 sports 
cars and record the miles per gallon for each. The data are listed below. 
Assume the miles per gallon are normally distributed. 


15 27 24 24 20 21 24 14 21 25 21 13 21 
25 22 21 25 24 22 24 24 22 21 24 24 


30. Yards Per Carry In a recent season, the standard deviation of the 
yards per carry for all running backs was 1.34. The yards per carry of 20 
randomly selected running backs are listed below. Assume the yards per 
carry are normally distributed. (Source: National Football League) 


5.6 44 3.8 45 33 5.0 3.6 3.7 48 3.5 
5.6 3.0 68 4.7 2.2 33 5.7 3.0 5.0 45 


Hospital Waiting Times In a random sample of 19 patients at a hospital’s 
minor emergency department, the mean waiting time before seeing a 
medical professional was 23 minutes and the standard deviation was 
11 minutes. Assume the waiting times are not normally distributed. 


Hospital Length of Stay In a random sample of 13 people, the mean length 
of stay at a hospital was 6.3 days and the standard deviation was 1.7 days. 
Assume the lengths of stay are normally distributed. (Adapted from American 
Hospital Association) 


In Exercises 33 and 34, use StatCrunch to construct the 90%, 95%, and 99% 
confidence intervals for the population mean. Interpret the results and compare the 
widths of the confidence intervals. Assume the random variable is normally distributed. 


33. 


34. 


35. 


36. 


Homework The weekly time spent (in hours) on homework for 18 randomly 
selected high school students 


12.0 11.3 13.5 11.7 12.0 13.0 15.5 10.8 12.5 
123 140 95 88 10.0 12.8 15.0 11.8 13.0 


Weight Lifting Ina random sample of 11 college football players, the mean 
weekly time spent weight lifting was 7.2 hours and the standard deviation 
was 1.9 hours. 


EXTENDING CONCEPTS 


Tennis Ball Manufacturing A company manufactures tennis balls. When 
its tennis balls are dropped onto a concrete surface from a height of 
100 inches, the company wants the mean height the balls bounce upward to 
be 55.5 inches. This average is maintained by periodically testing random 
samples of 25 tennis balls. If the t-value falls between —fp99 and f0.99, the 
company will be satisfied that it is manufacturing acceptable tennis balls. A 
sample of 25 balls is randomly selected and tested. The mean bounce height 
of the sample is 56.0 inches and the standard deviation is 0.25 inch. Assume 
the bounce heights are approximately normally distributed. Is the company 
making acceptable tennis balls? Explain your reasoning. 


Light Bulb Manufacturing A company manufactures light bulbs. The 
company wants the bulbs to have a mean life span of 1000 hours. This 
average is maintained by periodically testing random samples of 16 light 
bulbs. If the tvalue falls between —f)9) and fy99, the company will be 
satisfied that it is manufacturing acceptable light bulbs. A sample of 16 light 
bulbs is randomly selected and tested. The mean life span of the sample is 
1015 hours and the standard deviation is 25 hours. Assume the life spans are 
approximately normally distributed. Is the company making acceptable light 
bulbs? Explain your reasoning. 
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Confidence Intervals for a Mean (the impact 
of not knowing the standard deviation) 


The confidence intervals for a mean (the impact of not knowing the standard 
APPLET deviation) applet allows you to visually investigate confidence intervals for a 
population mean. You can specify the sample size n, the shape of the distribution 
(Normal or Right-skewed), the population mean (Mean), and the true 
population standard deviation (Std. Dev.). When you click SIMULATE, 100 
separate samples of size n will be selected from a population with these 
population parameters. For each of the 100 samples, a 95% Z confidence interval 
(known standard deviation) and a 95% T confidence interval (unknown standard 
deviation) are displayed in the plot at the right. The 95% Z confidence interval is 
displayed in green and the 95% T confidence interval is displayed in blue. If an 
interval does not contain the population mean, it is displayed in red. Additional 
simulations can be carried out by clicking SIMULATE multiple times. The 
cumulative number of times that each type of interval contains the population 
mean is also shown. Press CLEAR to clear existing results and start a new 


simulation. 
n:}10 

Distribution: |Normal 
m= Explore eT 
Step 1 Specify a value for n. Std. Dev.: ]10 
Step 2 Specify a distribution. Simulate | 
Step 3 Specify a value for the 

mean. Cumulative results: 


Step 4 Specify a value for the 
standard deviation. 

Step 5 Click SIMULATE to 
generate the confidence 
intervals. Prop. contained 


Clear | 


95% ZCI 95% TCI 
Contained mean 


Did not contain mean 


m= Draw Conclusions 


APPLET 1. Set n = 30, Mean = 25, Std. Dev. = 5, and the distribution to Normal. 
Run the simulation so that at least 1000 confidence intervals are generated. 
Compare the proportion of the 95% Z confidence intervals and 95% T 
confidence intervals that contain the population mean. Is this what you would 
expect? Explain. 


2. In a random sample of 24 high school students, the mean number of hours of 
sleep per night during the school week was 7.26 hours and the standard 
deviation was 1.19 hours. Assume the sleep times are normally distributed. 
Run the simulation for n = 10 so that at least 500 confidence intervals are 
generated. What proportion of the 95% Z confidence intervals and 95% 
T confidence intervals contain the population mean? Should you use a Z 
confidence interval or a T confidence interval for the mean number of hours 
of sleep? Explain. 


Presented by: https://jafrilibrary.org 


SECTION 6.3 CONFIDENCE INTERVALS FOR POPULATION PROPORTIONS 327 


WHAT YOU SHOULD LEARN 


>» How to find a point estimate 
for a population proportion 


> How to construct a confidence 
interval for a population 
proportion 


>» How to determine the 
minimum sample size required 
when estimating a population 
proportion 


INSIGHT 


In the first two sections, 

estimates were made : 
for quantitative data. 

In this section, sample 
proportions are used 

to make estimates for 

qualitative data. 


Confidence Intervals for Population Proportions 


Point Estimate for a Population Proportion » Confidence Intervals for a 
Population Proportion >» Finding a Minimum Sample Size 


>» POINT ESTIMATE FOR A POPULATION PROPORTION 


Recall from Section 4.2 that the probability of success in a single trial of a 
binomial experiment is p. This probability is a population proportion. In this 
section, you will learn how to estimate a population proportion p using a 
confidence interval. As with confidence intervals for w, you will start with a point 
estimate. 


DEFINITION 


The point estimate for p, the population proportion of successes, is given by 
the proportion of successes in a sample and is denoted by 


A 
pQ= oA Sample proportion 


where x is the number of successes in the sample and n is the sample size. The 
point estimate for the population proportion of failures is g = 1 — p. The 
symbols p and g are read as “p hat” and “gq hat.” 


EXAMPLE 1 


> Finding a Point Estimate for p 

In a survey of 1000 US. adults, 662 said that it is acceptable to check personal 
e-mail while at work. Find a point estimate for the population proportion of 
US. adults who say it is acceptable to check personal e-mail while at work. 
(Adapted from Liberty Mutual) 


> Solution 

Using n = 1000 and x = 662, 
A x 
Pn 

662 

1000 

0.662 


= 66.2% 


So, the point estimate for the population proportion of U.S. adults who say it 
is acceptable to check personal e-mail while at work is 66.2%. 


> Try It Yourself 1 


In a survey of 1006 US. adults, 181 said that Abraham Lincoln was the 
greatest president. Find a point estimate for the population proportion of 
US. adults who say Abraham Lincoln was the greatest president. (Adapted from 
The Gallup Poll) 


a. Identify x and n. 
b. Use x and n to find p. Answer: Page A40 
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In a recent year, there were 
about 9600 bird-aircraft collisions 
reported. A poll surveyed 2138 
people about bird-aircraft 
collisions. Of those surveyed, 

667 said that they are worried 
about bird-aircraft collisions. 


(Adapted from TripAdvisor) 


worried 
about 
bird-aircraft 
collisions 


Find a 90% confidence interval 
for the population proportion 
of people that are worried 
about bird-aircraft collisions. 


STUDY TIP 

Here are instructions for 
constructing a confidence interval 
for a population proportion on 

a TI-83/84 Plus. 


Choose the TESTS menu. 
A: 1—PropZint 


Enter the values of 

X, n, and the level of 
confidence c (C-Level). 
Then select Calculate. 
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>» CONFIDENCE INTERVALS FOR A POPULATION 
PROPORTION 


Constructing a confidence interval for a population proportion p is similar to 
constructing a confidence interval for a population mean. You start with a point 
estimate and calculate a margin of error. 


DEFINITION 


A c-confidence interval for a population proportion p is 
p-E<p<pt+eE 


where 


AA 


Pg 


E=z : 
“Nn 


The probability that the confidence interval contains p is c. 


In Section 5.5, you learned that a binomial distribution can be approximated 
by a normal distribution if np = 5 and nq = 5. When np = 5 and ng = 5, the 
sampling distribution of p is approximately normal with a mean of 

Mp — P 


and a standard error of 


Oo; = = 
Pp n° 


GUIDELINES 


Constructing a Confidence Interval for a Population Proportion 


IN WORDS IN SYMBOLS 
1. Identify the sample statistics n and x. 
; : 4 A a x 
2. Find the point estimate p. Dae 


3. Verify that the sampling distribution np = 5G =o 
of p can be approximated by a 
normal distribution. 


Use the Standard 
Normal Table or 


4. Find the critical value z, that 
corresponds to the given level 


of confidence c. technology. 

’ as 

5. Find the margin of error E. 5 | 
n 


6. Find the left and right endpoints 
and form the confidence interval. 


Left endpoint: p — E 
Right endpoint: p + E 
Interval: p-E<p<pt+kE 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


SECTION 6.3 CONFIDENCE INTERVALS FOR POPULATION PROPORTIONS 329 


MINITAB and TI-83/84 Plus EXAMPLE 2 Report 26 


steps are shown on pages 
352 and 353. 


> Constructing a Confidence Interval for p 


Use the data given in Example | to construct a 95% confidence interval for the 
population proportion of U.S. adults who say that it is acceptable to check 
personal e-mail while at work. 


> Solution 
From Example 1, p = 0.662. So, 
g = 1 — 0.662 = 0.338. 


Using n = 1000, you can verify that the sampling distribution of p can be 
approximated by a normal distribution. 


np = 1000-0.662 = 662 > 5 
nq = 1000-0.338 = 338 > 5 
Using z,. = 1.96, the margin of error is 
Pg (0.662) (0.338) 


Rae = 1.96 ~ 0.029. 
fey 1000 


The 95% confidence interval is as follows. 


STUDY TIP 


Notice in Example 2 that the 
confidence interval for the 
population proportion p is 
rounded to three decimal 
places. This round-off rule 
will be used throughout 
the text. 


Left Endpoint Right Endpoint 
p — E = 0.662 — 0.029 = 0.633 p+ E = 0.662 + 0.029 = 0.691 


ee a 


0.662 


Interpretation With 95% confidence, you can say that the population 
proportion of U.S. adults who say that it is acceptable to check personal e-mail 
while at work is between 63.3% and 69.1%. 


> Try It Yourself 2 


Use the data given in Try It Yourself 1 to construct a 90% confidence interval 
for the population proportion of U.S. adults who say that Abraham Lincoln 
was the greatest president. 


a. Find p and q. 

b. Verify that the sampling distribution of p can be approximated by a normal 
distribution. 

. Find z, and E. 

d. Use p and F to find the left and right endpoints of the confidence interval. 

e. Interpret the results. Answer: Page A40 


fe) 


The confidence level of 95% used in Example 2 is typical of opinion polls. 
The result, however, is usually not stated as a confidence interval. Instead, the 
result of Example 2 would be stated as “66.2% with a margin of error of +2.9%.” 
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INSIGHT 


In Example 3, note 
that np = 5 and 

ng = 5. So, the 
sampling distribution 
of p is approximately 
normal. 


»» To explore this topic further, 
~ see Activity 6.3 on page 336. 


EXAMPLE 3 G® Report 27 


> Constructing a Confidence Interval for p 


The graph shown at the right Who are the more dangerous drivers? 
is from a survey of 498 US. EN 


adults. Construct a 99% confi- 
dence interval for the popula- 
tion proportion of U.S. adults 
who think that teenagers are 
the more dangerous drivers. 
(Source: The Gallup Poll) 


4% No opinion 


> Solution 
From the graph, p = 0.71. So, 


g@=1-071 
= 0.29. 


Using these values and the values n = 498 and z, = 2.575, the margin of 
error is 


E =z, Ze 
n 
~ 2.575 (0.71)(0.29) Use Table 4 in Appendix B to estimate 
<= 498 that z, is halfway between 2.57 and 2.58. 
= 0.052. 


The 99% confidence interval is as follows. 


Left Endpoint Right Endpoint 
p- E071 — 0.052 = 0.658 p+ E 0.71 + 0.052 = 0.762 


— < p < 0.762 oll 


Interpretation With 99% confidence, you can say that the population 
proportion of U.S. adults who think that teenagers are the more dangerous 
drivers is between 65.8% and 76.2%. 


> Try It Yourself 3 


Use the data given in Example 3 to construct a 99% confidence interval for the 
population proportion of adults who think that people over 65 are the more 
dangerous drivers. 


a. Find p and q. 

b. Verify that the sampling distribution of p can be approximated by a normal 
distribution. 

c. Find z, and E. 

d. Use p and F to find the left and right endpoints of the confidence interval. 

e. Interpret the results. Answer: Page A40 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


SECTION 6.3 CONFIDENCE INTERVALS FOR POPULATION PROPORTIONS 331 


> FINDING A MINIMUM SAMPLE SIZE 


One way to increase the precision of a confidence interval without decreasing the 
level of confidence is to increase the sample size. 


INSIGHT FINDING A MINIMUM SAMPLE SIZE TO ESTIMATE p 


The reason for using 0.5 as the Given a c-confidence level and a margin of error E, the minimum sample size 
values of 6 and gq when no n needed to estimate p is 
preliminary estimate is available 2 
is that these values yield a n= pa ge 
; Pq E 
maximum value of the 


product pq = p(1 — p). es This formula assumes that you have preliminary estimates of p and q. If not, 
In other words, if you use p = 0.5 and g = 0.5. 

don’t estimate the 
values of p and gq, you 
must pay the penalty 


of using a larger sample. EXAMPLE 4 


>» Determining a Minimum Sample Size 


You are running a political campaign and wish to estimate, with 95% 
confidence, the population proportion of registered voters who will vote for 
your candidate. Your estimate must be accurate within 3% of the population 
proportion. Find the minimum sample size needed if (1) no preliminary 
estimate is available and (2) a preliminary estimate gives p = 0.31. Compare 
your results. 


> Solution 


1. Because you do not have a preliminary estimate of p, use p = 0.5 and 
q = 0.5. Using z, = 1.96 and E = 0.03, you can solve for n. 


axl teV 1.96 \? 
n= pa( =) é (0(0.5)( 47%) ~ 1067.11 


Because n is a decimal, round up to the nearest whole number, 1068. 


2. You have a preliminary estimate of p = 0.31. So, q = 0.69. Using z, = 1.96 
and EF = 0.03, you can solve for n. 


nal Se V 1.96 \2 
n= pa( =) = (0.31)(0.69)( 172) ~ 913.02 


Because v is a decimal, round up to the nearest whole number, 914. 


Interpretation With no preliminary estimate, the minimum sample size 
should be at least 1068 registered voters. With a preliminary estimate of 
p = 0.31, the sample size should be at least 914 registered voters. So, you will 
need a larger sample size if no preliminary estimate is available. 


> Try It Yourself 4 


You wish to estimate, with 90% confidence, the population proportion of 
females who refuse to eat leftovers. Your estimate must be accurate within 2% 
of the population proportion. Find the minimum sample size needed if (1) no 
preliminary estimate is available and (2) a previous survey found that 11% of 
females refuse to eat leftovers. (Source: Consumer Reports National Research Center) 


a. Identify p, q, Z., and E. If p is unknown, use 0.5. 
b. Use p, q, z., and E to find the minimum sample size n. 
c. Determine how many females should be included in the sample. 
Answer: Page A40 
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CONFIDENCE INTERVALS 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


True or False? In Exercises 1 and 2, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


1. To estimate the value of p, the population proportion of successes, use 
the point estimate x. 


2. The point estimate for the proportion of failures is 1 — p. 


Finding p and q_ In Exercises 3-6, let p be the population proportion for the 
given condition. Find point estimates of p and q. 


3. Recycling In a survey of 1002 US. adults, 752 say they recycle. (Adapted 
from ABC News Poll) 


4. Charity In a survey of 2939 US. adults, 2439 say they have contributed to 
a charity in the past 12 months. (Adapted from Harris Interactive) 


5. Computers In a survey of 11,605 parents, 4912 think that the government 
should subsidize the costs of computers for lower-income families. (Adapted 
from Disney Family.com) 


6. Vacation Ina survey of 1003 US. adults, 110 say they would go on vacation 
to Europe if cost did not matter. (Adapted from The Gallup Poll) 


In Exercises 7-10, use the given confidence interval to find the margin of error and 
the sample proportion. 


7. (0.905, 0.933) 8. (0.245, 0.475) 
9. (0.512, 0.596) 10. (0.087, 0.263) 


M@ USING AND INTERPRETING CONCEPTS 


Constructing Confidence Intervals Jn Exercises 11 and 12, construct 90% 
and 95% confidence intervals for the population proportion. Interpret the results 
and compare the widths of the confidence intervals. If convenient, use technology 
to construct the confidence intervals. 


11. Dental Visits In a survey of 674 U.S. males ages 18-64, 396 say they have 
gone to the dentist in the past year. (Adapted from National Center for Health 
Statistics) 


12. Dental Visits Ina survey of 420 US. females ages 18-64, 279 say they have 
gone to the dentist in the past year. (Adapted from National Center for Health 
Statistics) 


Constructing Confidence Intervals In Exercises 13 and 14, construct a 
99% confidence interval for the population proportion. Interpret the results. 


13. Going Green In a survey of 3110 US. adults, 1435 say they have started 
paying bills online in the last year. (Adapted from Harris Interactive) 


14. Seena Ghost Inasurvey of 4013 US. adults, 722 say they have seen a ghost. 
(Adapted from Pew Research Center) 
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15. 


20. 


Nail Polish In a survey of 7000 women, 4431 say they change their nail 
polish once a week. Construct a 95% confidence interval for the population 
proportion of women who change their nail polish once a week. (Adapied 
from Essie Cosmetics) 


. World Series Ina survey of 891 U.S. adults who follow baseball in a recent 


year, 184 said that the Boston Red Sox would win the World Series. 
Construct a 90% confidence interval for the population proportion of US. 
adults who follow baseball who in a recent year said that the Boston Red Sox 
would win the World Series. (Adapted from Harris Interactive) 


. Alternative Energy You wish to estimate, with 95% confidence, the 


population proportion of U.S. adults who want more funding for alternative 
energy. Your estimate must be accurate within 4% of the population 
proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 78% of U.S. adults want more funding for alternative energy. (Source: 
Pew Research Center) 


(c) Compare the results from parts (a) and (b). 


. Reading Fiction You wish to estimate, with 99% confidence, the population 


proportion of U.S. adults who read fiction books. Your estimate must be 
accurate within 2% of the population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 47% of U.S. adults read fiction books. (Source: National Endowment 
for the Arts) 


(c) Compare the results from parts (a) and (b). 


. Emergency Room Visits You wish to estimate, with 90% confidence, the 


population proportion of U.S. adults who made one or more emergency 
room visits in the past year. Your estimate must be accurate within 3% of the 
population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 20.1% of U.S. adults made one or more emergency room visits in the 
past year. (Source: National Center for Health Statistics) 


(c) Compare the results from parts (a) and (b). 
Ice Cream You wish to estimate, with 95% confidence, the population 


proportion of U.S. adults who say chocolate is their favorite ice cream flavor. 
Your estimate must be accurate within 5% of the population proportion. 


(a) No preliminary estimate is available. Find the minimum sample size 
needed. 


(b) Find the minimum sample size needed, using a prior study that found 
that 27% of U.S. adults say that chocolate is their favorite ice cream 
flavor. (Source: Harris Interactive) 


(c) Compare the results from parts (a) and (b). 
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CONFIDENCE INTERVALS 


Constructing Confidence Intervals 
In Exercises 21 and 22, use the following 
information. The graph shows the results 
of a survey in which 1017 adults from the 
United States, 1060 adults from Italy, and 
1126 adults from Great Britain were asked 
if they believe climate change poses a 
large threat to the world. (Source: Harris 
Interactive) 


21. Global Warming Construct a 99% 
confidence interval for 


Does climate change pose a 
large threat to the world? 


United | Great 
States 


Britain 


(a) the population proportion of adults from the United States who say that 
climate change poses a large threat to the world. 


(b) the population proportion of adults from Italy who say that climate 
change poses a large threat to the world. 


(c) the population proportion of adults from Great Britain who say that 
climate change poses a large threat to the world. 


22. Global Warming Determine whether it is possible that the following 
proportions are equal and explain your reasoning. 


(a) The proportion of adults from Exercise 21(a) and the proportion of 


adults from Exercise 21(b). 


(b) The proportion of adults from Exercise 21(b) and the proportion of 


adults from Exercise 21(c). 


(c) The proportion of adults from Exercise 21(a) and the proportion of 


adults from Exercise 21(c). 


Constructing Confidence Intervals 
In Exercises 23 and 24, use the following 
information. The table shows the results of 
a survey in which separate samples of 400 
adults each from the East, South, Midwest, 
and West were asked if traffic congestion 
is a serious problem in their community. 
(Adapted from Harris Interactive) 


23. South and West Construct a 95% 
confidence interval for the population 
proportion of adults 


(a) from the South who say traffic 
congestion is a serious problem. 


, —_ ——— 
p= | 


East 36% 
South 32% YF 


West 56% 
ae 


(b) from the West who say traffic congestion is a serious problem. 


24. East and Midwest Construct a 95% confidence interval for the population 


proportion of adults 


(a) from the East who say traffic congestion is a serious problem. 


(b) from the Midwest who say traffic congestion is a serious problem. 


25. Writing Is it possible that the proportions in Exercise 23 are equal? What 
if you used a 99% confidence interval? Explain your reasoning. 


26. Writing Is it possible that the proportions in Exercise 24 are equal? What 
if you used a 99% confidence interval? Explain your reasoning. 
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In Exercises 27 and 28, use StatCrunch to construct 90%, 95%, and 99% 
confidence intervals for the population proportion. Interpret the results and 
compare the widths of the confidence intervals. 


27. Congress In a survey of 1025 US. adults, 802 disapprove of the job 
Congress is doing. (Adapted from The Gallup Poll) 


28. UFOs Ina survey of 2303 US. adults, 734 believe in UFOs. (Adapted from 


Harris Interactive) 


M@ EXTENDING CONCEPTS 


Newspaper Surveys Jn Exercises 29 and 30, translate the newspaper excerpt 
into a confidence interval for p. Approximate the level of confidence. 


29. In a survey of 8451 U.S. adults, 31.4% said they were taking vitamin E as a 
supplement. The survey’s margin of error is plus or minus 1%. (Source: 
Decision Analyst, Inc.) 


30. In a survey of 1000 U.S. adults, 19% are concerned that their taxes will be 
audited by the Internal Revenue Service. The survey’s margin of error is plus 
or minus 3%. (Source: Rasmussen Reports) 


31. Why Check It? Why is it necessary to check that np = 5 and ng = 5? 


32. Sample Size The equation for determining the sample size 


can be obtained by solving the equation for the margin of error 


AA 


Pq 
n 


E=2Z 


for n. Show that this is true and justify each step. 


33. Maximum Value of pq Complete the tables for different values of p and 
q = 1 -— p. From the tables, which value of p appears to give the maximum 
value of the product pq? 


0.0 1.0 


0.00 0.45 
0.1 0.9 0.09 0.46 
0.2 0.8 0.47 
0.3 0.48 
0.4 0.49 
0.5 0.50 
0.6 0.51 
0.7 0.52 
0.8 0.53 
0.9 0.54 
1.0 0.55 
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ADMEET 


Confidence Intervals for a Proportion 


The confidence intervals for a proportion applet allows you to visually investigate 
confidence intervals for a population proportion. You can specify the sample size 
n and the population proportion p. When you click SIMULATE, 100 separate 
samples of size n will be selected from a population with a proportion of successes 
equal to p. For each of the 100 samples, a 95% confidence interval (in green) and 
a 99% confidence interval (in blue) are displayed in the plot at the right. Each of 
these intervals is computed using the standard normal approximation. If an 
interval does not contain the population proportion, it is displayed in red. Note 
that the 99% confidence interval is always wider than the 95% confidence 
interval. Additional simulations can be carried out by clicking SIMULATE 
multiple times. The cumulative number of times that each type of interval 
contains the population proportion is also shown. Press CLEAR to clear existing 
results and start a new simulation. 


n:|100 
p:|0.5 


Simulate | 


Cumulative results: 


95% Cl 99% CI 
Contained p 
Did not contain p 


Prop. contained 


Clear | 


m Explore 


Step 1 Specify a value for n. 
Step 2 Specify a value for p. 
Step 3) Click SIMULATE to generate the confidence intervals. 


= Draw Conclusions 


1. Run the simulation for p = 0.6 and n = 10, 20, 40, and 100. Clear the results 
after each trial. What proportion of the confidence intervals for each 
confidence level contains the population proportion? What happens to the 
proportion of confidence intervals that contains the population proportion for 
each confidence level as the sample size increases? 


2. Run the simulation for p = 0.4 and n = 100 so that at least 1000 confidence 
intervals are generated. Compare the proportion of confidence intervals that 
contains the population proportion for each confidence level. Is this what you 
would expect? Explain. 
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WHAT YOU SHOULD LEARN 


>» How to interpret the 
chi-square distribution and use 
a chi-square distribution table 


» How to use the chi-square 
distribution to construct a 
confidence interval for 
the variance and standard 
deviation 


STUDY TIP 


The Greek letter x is 
pronounced “ki,” which 
rhymes with the more 
familiar Greek letter 7. 


Confidence Intervals for Variance and Standard Deviation 


The Chi-Square Distribution » Confidence Intervals for 7? and o 


> THE CHI-SQUARE DISTRIBUTION 


In manufacturing, it is necessary to control the amount that a process varies. For 
instance, an automobile part manufacturer must produce thousands of parts to be 
used in the manufacturing process. It is important that the parts vary little or not 
at all. How can you measure, and consequently control, the amount of variation 
in the parts? You can start with a point estimate. 


DEFINITION 


The point estimate for a7 is s? and the point estimate for o is s. The most 


unbiased estimate for 7 is s?. 


You can use a chi-square distribution to construct a confidence interval for 
the variance and standard deviation. 


DEFINITION 


If a random variable x has a normal distribution, then the distribution of 


cS hs 
x = 2 
Oo 


forms a chi-square distribution for samples of any size n> 1. Four 
properties of the chi-square distribution are as follows. 
1. All chi-square values x7 are greater than or equal to 0. 


2. The chi-square distribution is a family of curves, each determined by 
the degrees of freedom. To form a confidence interval for a7, use the 
’-distribution with degrees of freedom equal to one less than 
the sample size. 


df.=n—1 Degrees of freedom 


3. The area under each curve of the chi-square distribution equals 1. 
4. Chi-square distributions are positively skewed. 


Chi-Square Distributions 
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There are two critical values for each level of confidence. The value y2 
STUDY TIP XR 
For chi-square critical values 

with a c-confidence level, the 
following values are what you 
look up in Table 6 in Appendix B. 


represents the right-tail critical value and Xr represents the left-tail critical value. 


Table 6 in Appendix B lists critical values of y” for various degrees of freedom 
and areas. Each area in the table represents the region under the chi-square curve 
to the right of the critical value. 


EXAMPLE 1 


> Finding Critical Values for x? 
Find the critical values a and Xi for a 95% confidence interval when the 
sample size is 18. 


> Solution 
Because the sample size is 18, there are 


df. =n —1= 18 — 1 = 17 degrees of freedom. 


The areas to the right of Xp and Xr are 
1-—c  1- 0.95 


Area to right of x; = a 5 — = (0.025 
and 
+ +0. 
Area to right of Ce ae 5) ae : re 0.975. 


Area to the right of y? 
Part of Table 6 is shown. Using d.f. = 17 and the areas 0.975 and 0.025, you can 


The result is that you can conclude : be poe i 
find the critical values, as shown by the highlighted areas in the table. 


that the area between the left 
and right critical values is c. 


Degrees of ss 
freedom 


0.05 
0.004 0.016 2.706 3.841 
0.103 0.211 4605 5.991 
7.815 


24.996 
USE. Osi Assy Neve) 
5 10 94769 2758 


31% 9.390 25.989 28.869 31.5. 
8.907 \10.117 27.204 30.144 32.852 
9.591 |10.851 12.443 28.412 31410 34.170 


hy a 


From the table, you can see that Xp = 30.191 and x7 = 7.564. 


Interpretation So, 95% of the area under the curve lies between 7.564 and 
30.191. 


> Try It Yourself 1 

Find the critical values ve and Xi for a 90% confidence interval when the 
sample size is 30. 

a. Identify the degrees of freedom and the level of confidence. 

b. Find the areas to the right of XG and Nes 


: 
f | ™—_ 9 

j 10 20 30\ 40 c. Use Table 6 in Appendix B to find x. and XY: 

= 7.564 Xp = 30.191 d. Interpret the results. Answer: Page A40 
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The Florida panther is one of 
the most endangered mammals 
on Earth. In the southeastern 
United States, the only breeding 
population (about 100) can be 
found on the southern tip of 
Florida. Most of the panthers live 
in (1) the Big Cypress National 
Preserve, (2) Everglades National 
Park, and (3) the Florida Panther 
National Wildlife Refuge, as 
shown on the map. In a recent 
study of 19 female panthers, 

it was found that the mean 
litter size was 2.4 kittens, with 

a standard deviation of 0.9. 
(Source: U.S. Fish & Wildlife Service) 


Construct a 90% confidence 
interval for the standard 
deviation of the litter size for 
female Florida panthers. 
Assume the litter sizes are 
normally distributed. 
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> CONFIDENCE INTERVALS FOR o2 AND a 


You can use the critical values x and x to construct confidence intervals for 
a population variance and standard deviation. The best point estimate for the 
variance is s* and the best point estimate for the standard deviation is s. 


DEFINITION 


The c-confidence intervals for the population variance and standard deviation 
are as follows. 


Confidence Interval for 0: 
(n — 1)s? pee ys: 
XR XL 
Confidence Interval for o: 
(a= 1 (ae — Vys2 
ie <o<s a eae, 
XR XL 


The probability that the confidence intervals contain 07 or a is c. 


GUIDELINES 


Constructing a Confidence Interval for a Variance and Standard Deviation 


IN WORDS IN SYMBOLS 
1. Verify that the population has 
a normal distribution. 
2. Identify the sample statistic n Chit = = Il 
and the degrees of freedom. 
ae ete 2 ee) 
3. Find the point estimate s~. SSeS 
4. Find the critical values Me and Use Table 6 in Appendix B. 
Xi that correspond to the given 
level of confidence c. 
Left Endpoint Right Endpoint 
(n — 1)s* Se se 
5. Find the left and right ie ee 
endpoints and form the XR XL 
confidence interval for the 
population variance. 
= 2 = 1)s2 
6. Find the confidence interval uw a <ZOo< G - S 
for the population standard XR XL 


deviation by taking the square 
root of each endpoint. 
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STUDY TIP 


When a confidence interval 
for a population variance 
or standard deviation is 
computed, the general 
round-off rule is to round 
off to the same number 
of decimal places given 
for the sample variance 
or standard deviation. 


EXAMPLE 2 G&® Report 28 


> Constructing a Confidence Interval 

You randomly select and weigh 30 samples of an allergy medicine. The sample 
standard deviation is 1.20 milligrams. Assuming the weights are normally 
distributed, construct 99% confidence intervals for the population variance 
and standard deviation. 


> Solution 
The areas to the right of y;, and x7 are 


1 = 1 = 0, 
Area to right of i eer) — 5 2 0.005 


and 


L+¢ 174099 
2 2 


Area to right of NG = = 0.995. 


Using the values n = 30, d.f. = 29, and c = 0.99, the critical values Ne and 
Ds are 
Xp = 52.336 and = y; = 13.121. 


Using these critical values and s = 1.20, the confidence interval for o” is as 
follows. 


Left Endpoint Right Endpoint 
(n—1)s* (30 —1)(1.20)? =(n-—1)s* (30 — 1)(1.20)? 
x2 52.336 x? 13.121 
= 0.80 ~ 3.18 


\ 0.80 < o” < 3.18 ae 


The confidence interval for o is 


G0 =1)0.20) 2.2 (G0 = 1)0.20) 
52.336 13.121 
0.89 <a < 1.78. 


Interpretation With 99% confidence, you can say that the population 
variance is between 0.80 and 3.18, and the population standard deviation is 
between 0.89 and 1.78 milligrams. 


> Try It Yourself 2 


Find the 90% and 95% confidence intervals for the population variance and 
standard deviation of the medicine weights. 


a. Find the critical values ¥; and es for each confidence interval. 


b. Use n,s, ee and Xi to find the /eft and right endpoints for each confidence 
interval for the population variance. 
. Find the square roots of the endpoints of each confidence interval. 
d. Specify the 90% and 95% confidence intervals for the population variance 
and standard deviation. Answer: Page A40 


fe) 
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DZ) Exercises 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Does a population have to be normally distributed in order to use the 
chi-square distribution? 


FOR EXTRA HELP; 2. What happens to the shape of the chi-square distribution as the degrees of 
Be A freedom increase? 


In Exercises 3-8, find the critical values re and re for the given confidence level 
cand sample size n. 


3. c = 0.90, n = 8 4. c = 0.99, n = 15 
5. c = 0.95, n = 20 6. c = 0.98, n = 26 
7. c = 0.99, n = 30 8. c = 0.80, n = 51 


M@ USING AND INTERPRETING CONCEPTS 


Constructing Confidence Intervals Jn Exercises 9-24, assume each sample 
is taken from a normally distributed population and construct the indicated 
confidence intervals for (a) the population variance 0? and (b) the population 
standard deviation o. Interpret the results. 


9. Vitamins To analyze the variation in weights of vitamin supplement tablets, 
you randomly select and weigh 14 tablets. The results (in milligrams) are 
shown. Use a 90% level of confidence. 


500.000 499.995 500.010 499.997 500.015 
499.988 500.000 499.996 500.020 500.002 
499.998 499.996 500.003 500.000 


10. Cough Syrup You randomly select and measure the volumes of the 
contents of 15 bottles of cough syrup. The results (in fluid ounces) are shown. 
Use a 90% level of confidence. 


4.211 4.246 4.269 4.241 4.260 
4.293 4.189 4.248 4.220 4.239 
4.253 4.209 4.300 4.256 4.290 


11. Car Batteries The reserve capacities (in hours) of 18 randomly selected 
automotive batteries are shown. Use a 99% level of confidence. (Adapted from 
Consumer Reports) 


1.70 1.60 1.94 1.58 1.74 1.60 
1.86 1.72 1.38 146 1.64 1.49 
1.55 1.70 1.75 0.88 1.77 2.07 


12. Bolts You randomly select and measure the lengths of 17 bolts. The results 
(in inches) are shown. Use a 95% level of confidence. 


1.286 1.138 1.240 1.132 1.381 1.137 
1.300 1.167 1.240 1.401 1.241 1.171 
1.217 1.360 1.302 1.331 1.383 
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LCD TVs_ A magazine includes a report on the energy costs per year for 
32-inch liquid crystal display (LCD) televisions. The article states that 14 
randomly selected 32-inch LCD televisions have a sample standard deviation 
of $3.90. Use a 99% level of confidence. (Adapted from Consumer Reports) 


A magazine includes a report on the prices of subcompact 
digital cameras. The article states that 11 randomly selected subcompact 
digital cameras have a sample standard deviation of $109. Use an 80% level 
of confidence. (Adapted from Consumer Reports) 


. Spring Break As part of your spring break planning, you randomly select 


10 hotels in Cancun, Mexico, and record the room rate for each hotel. The 
results are shown in the stem-and-leaf plot. Use a 98% level of confidence. 
(Source: Expedia, Inc.) 


6/9 Key: 7|4 = 74 
7|4 

8 

9} 099 

10 

11:2 

12 

13 | 69 

14 | 9 

15-0 


The weights (in pounds) of a random sample of 14 cordless 
drills are shown in the stem-and-leaf plot. Use a 99% level of confidence. 
(Adapted from Consumer Reports) 


31469 Key: 3|4 = 3.4 
4|689 

5|134579 

6/01 


The pulse rates of a random sample of 16 adults are shown in 
the dot plot. Use a 95% level of confidence. 


Pulse Rates 


60 65 70 75 80 85 


Beats per minute 


18. Blu-Ray™ Players The prices of a random sample of 27 Blu-ray™ 
players are shown in the dot plot. Use a 98% level of confidence. 
(Adapted from Consumer Reports) 


Blu-Ray™ Players 


~TTTr+rrrrrtrrt 
200 300 


Prices 


https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


SECTION 6.4 


19. 


20. 


21. 


22. 


23. 


24. 
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Water Quality As part of a water quality 
survey, you test the water hardness in several 
randomly selected streams. The results are 
shown in the figure. Use a 95% level of 
confidence. 


s = 15 grains/gallon 4 


Website Costs As part of a survey, you ask a 
random sample of business owners how much 
they would be willing to pay for a website for 
their company. The results are shown in the 
figure. Use a 90% level of confidence. 


How much will you 
pay for your site? 


i= 310) 
s = $3600 


Annual Earnings The annual earnings of 14 randomly selected computer 
software engineers have a sample standard deviation of $3725. Use an 80% 
level of confidence. (Adapted from U.S. Bureau of Labor Statistics) 


Annual Precipitation The average annual precipitations (in inches) of a 
random sample of 30 years in San Francisco, California have a sample 
standard deviation of 8.18 inches. Use a 98% level of confidence. (Source: 
Golden Gate Weather Services) 


Waiting Times The waiting times (in minutes) of a random sample of 22 
people at a bank have a sample standard deviation of 3.6 minutes. Use a 98% 
level of confidence. 


Motorcycles The prices of a random sample of 20 new motorcycles have a 
sample standard deviation of $3900. Use a 90% level of confidence. 


In Exercises 25-28, use StatCrunch to help you construct the indicated 
confidence intervals for the population variance o* and the population standard 
deviation o. Assume each sample is taken from a normally distributed population. 


25. 
27. 


29. 


30. 


31. 


c = 0.95, s* = 11.56, n = 30 26. c = 0.99, s* = 0.64,n =7 
c= 0.90.5 = 35,n = 18 8. c= 097,58 =278.1,n = 45 


EXTENDING CONCEPTS 


Vitamin Tablet Weights You are analyzing the sample of vitamin 
supplement tablets in Exercise 9. The population standard deviation of the 
tablets’ weights should be less than 0.015 milligram. Does the confidence 
interval you constructed for o suggest that the variation in the tablets’ 
weights is at an acceptable level? Explain your reasoning. 


Cough Syrup Bottle Contents You are analyzing the sample of cough syrup 
bottles in Exercise 10. The population standard deviation of the volumes of 
the bottles’ contents should be less than 0.025 fluid ounce. Does the confidence 
interval you constructed for o suggest that the variation in the volumes of the 
bottles’ contents is at an acceptable level? Explain your reasoning. 


In your own words, explain how finding a confidence interval for a 
population variance is different from finding a confidence interval for a 
population mean or proportion. 
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USES AND ABUSES 


Uses 


By now, you know that complete information about population parameters is 
often not available. The techniques of this chapter can be used to make interval 
estimates of these parameters so that you can make informed decisions. 

From what you learned in this chapter, you know that point estimates 
(sample statistics) of population parameters are usually close but rarely equal to 
the actual values of the parameters they are estimating. Remembering this can 
help you make good decisions in your career and in everyday life. For instance, 
suppose the results of a survey tell you that 52% of the population plans to vote 
in favor of the rezoning of a portion of a town from residential to commercial use. 
You know that this is only a point estimate of the actual proportion that will vote 
in favor of rezoning. If the interval estimate is 0.49 < p < 0.55, then you know 
this means it is possible that the item will not receive a majority vote. 


Abuses 


Unrepresentative Samples There are many ways that surveys can result in 
incorrect predictions. When you read the results of a survey, remember to 
question the sample size, the sampling technique, and the questions asked. For 
instance, suppose you want to know the proportion of people who will vote in 
favor of rezoning. From the diagram below, you can see that even if your sample 
is large enough, it may not consist of actual voters. 


Registered voters 


Actual 
voters 


Using a small sample might be the only way to make an estimate, but be 
aware that a change in one data value may completely change the results. 
Generally, the larger the sample size, the more accurate the results will be. 


Biased Survey Questions In surveys, it is also important to analyze the wording 
of the questions. For instance, the question about rezoning might be presented as: 
“Knowing that rezoning will result in more businesses contributing to school 
taxes, would you support the rezoning?” 


M@ EXERCISES 


1. Unrepresentative Samples Find an example of a survey that is reported 
in a newspaper, magazine, or on a website. Describe different ways that the 
sample could have been unrepresentative of the population. 


2. Biased Survey Questions Find an example of a survey that is reported in a 
newspaper, magazine, or on a website. Describe different ways that the survey 
questions could have been biased. 
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) CHAPTER SUMMARY 


REVIEW 

What did you learn? EXAMPLE(S) | EXERCISES 
Section 6.1 

= How to find a point estimate and a margin of error 1,2 1,2 

E=z c Margin of 
= Loy argin of error 
Vn 
= How to construct and interpret confidence intervals for the 3-5 3-6 


population mean 
x-E<p<xt+E 


= How to determine the minimum sample size required when estimating uw 6 7-10 
Section 6.2 
= How to interpret the ¢-distribution and use a f-distribution table 1 11-16 
2S) 
(s/Vn) 
= How to construct confidence intervals when n < 30, the population is 2-4 17-26 


normally distributed, and o is unknown 


RY 


Va 


¥-E<p<X+E, E=t, 


Section 6.3 
= How to find a point estimate for a population proportion 1 27-34 
n_ x 
Pen 
= How to construct a confidence interval for a population proportion 2,3 35-42 
p-E<p<pt+kE, E=z 4 
= How to determine the minimum sample size required when estimating a 4 43, 44 
population proportion 
Section 6.4 
= How to interpret the chi-square distribution and use a chi-square i 45-48 
distribution table 
Ae (n — 1)s” 
= 
= How to use the chi-square distribution to construct a confidence interval for 2 49-52 
the variance and standard deviation 
n — 1)s? n — 1)s? — 1)s2 — 1 )\2 
ae ) <a <= ae ) ee wh SoS ee is 
2 2 2 2 
xX R X 18 Xx R Xx L 
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WD REVIEW EXERCISES 


M@ SECTION 6.1 


In Exercises 1 and 2, find (a) the point estimate of the population mean pw and 
(b) the margin of error for a 90% confidence interval. 


* 1. Waking times of 40 people who start work at 8:00 A.M. (in minutes past 
5:00 A.M.) 


135 145 95 140 135 95 110 50 
90 165 110 125 80 125 130 110 
25 75 65 100 60 125 115 135 
95 90 140 40 75 50 130 85 

100 160 135 45 135 115 75 130 


" 2. Lengths of commutes to work of 32 people (in miles) 


2 9 7 28 FT 3 27 


21 10 13 3 7 2 30 7 
6 13 6 14 4 1 10 3 
13. 6 2 9 2 12 16 18 


In Exercises 3 and 4, construct the indicated confidence interval for the 
population mean jw. If convenient, use technology to construct the confidence 
interval. 


3. c = 0.99, ¥ = 15.8, s = 0.85,n = 80 
4. c = 0.95, ¥ = 7.675, s = 0.105, n = 55 


In Exercises 5 and 6, use the given confidence interval to find the margin of error 
and the sample mean. 


5. (20.75, 24.10) 6. (7.428, 7.562) 
In Exercises 7-10, determine the minimum sample size n needed to estimate w. 


7. Use the results of Exercise 1. Determine the minimum survey size that is 
necessary to be 95% confident that the sample mean waking time is within 
10 minutes of the actual mean waking time. 


8. Use the results of Exercise 1. Now suppose you want 99% confidence with a 
margin of error of 2 minutes. How many people would you need to survey? 


9. Use the results of Exercise 2. Determine the minimum survey size that is 
necessary to be 95% confident that the sample mean length of commutes to 
work is within 2 miles of the actual mean length of commutes to work. 


10. Use the results of Exercise 2. Now suppose you want 98% confidence with a 
margin of error of 0.5 mile. How many people would you need to survey? 


M SECTION 6.2 


In Exercises 11-14, find the critical value t, for the given confidence level c and 
sample size n. 


11. c = 0.80, n = 10 12. c = 0.95, n = 24 
13. c = 0.98, n = 15 14. c = 0.99, n = 30 


15. Consider a 90% confidence interval for 1. Assume a is not known. For which 
sample size, n = 20 or n = 30, is the critical value ¢, larger? 
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Consider a 90% confidence interval for 4. Assume o is not known. For which 
sample size,n = 20 or n = 30, is the confidence interval wider? 


In Exercises 17-20, find the margin of error for w. 


17. 
18. 
19. 
20. 


c = 0.90, s = 25.6,n = 16, X = 72.1 
c = 0.95, s = 1.1,n = 25, x = 3.5 
c = 0.98, s = 0.9,n = 12, x = 68 
c = 0.99, s = 16.5,n = 20, x% = 25.2 


In Exercises 21-24, construct the confidence interval for u using the statistics from 
the given exercise. If convenient, use technology to construct the confidence interval. 


21. 
23. 
25. 


26. 


Exercise 17 22. Exercise 18 
Exercise 19 24. Exercise 20 
In a random sample of 28 sports cars, the average annual fuel cost was $2218 


and the standard deviation was $523. Construct a 90% confidence interval 
for w. Assume the annual fuel costs are normally distributed. (Adapted from 
U.S. Department of Energy) 


Repeat Exercise 25 using a 99% confidence interval. 


M@ SECTION 6.3 


In Exercises 27-34, let p be the proportion of the population who respond yes. 
Use the given information to find p and q. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


A survey asks 1500 US. adults if they will participate in the 2010 Census. The 
results are shown in the pie chart. (Adapted from Pew Research Center) 


In a survey of 500 U.S. adults, 425 say they would trust doctors to tell the 
truth. (Adapted from Harris Interactive) 


In a survey of 1023 U.S. adults, 552 say they have worked the night shift at 
some point in their lives. (Adapted from CNN/Opinion Research) 


In a survey of 800 USS. adults, 90 are making the minimum payment(s) on 
their credit card(s). (Adapted from Cambridge Consumer Credit Index) 


In a survey of 1008 USS. adults, 141 say the cost of health care is the most 
important financial problem facing their family today. (Adapted from 
Gallup, Inc.) 


In a survey of 938 US. adults, 235 say the phrase “you know” is the most 
annoying conversational phrase. (Adapted from Marist Poll) 


In a survey of 706 parents with kids 4 to 8 years old, 346 say that they know 
their state booster seat law. (Adapted from Knowledge Networks, Inc.) 


In a survey of 2365 U.S. adults, 1230 say they worry most about missing 
deductions when filing their taxes. (Adapted from USA TODAY) 


In Exercises 35-42, construct the indicated confidence interval for the 
population proportion p. If convenient, use technology to construct the confidence 
interval. Interpret the results. 


35. 
36. 
37. 
38. 


Use the sample in Exercise 27 with c = 0.95. 
Use the sample in Exercise 28 with c = 0.99. 
Use the sample in Exercise 29 with c = 0.90. 
Use the sample in Exercise 30 with c = 0.98. 
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39. Use the sample in Exercise 31 with c = 0.99. 
40. Use the sample in Exercise 32 with c = 0.90. 
41. Use the sample in Exercise 33 with c = 0.80. 
42. Use the sample in Exercise 34 with c = 0.98. 


43. You wish to estimate, with 95% confidence, the population proportion of 
US. adults who think they should be saving more money. Your estimate 
must be accurate within 5% of the population proportion. 

(a) No preliminary estimate is available. Find the minimum sample size 
needed. 

(b) Find the minimum sample size needed, using a prior study that found 
that 63% of U.S. adults think that they should be saving more money. 
(Source: Pew Research Center) 


(c) Compare the results from parts (a) and (b). 


44, Repeat Exercise 43 part (b), using a 99% confidence level and a margin of 
error of 2.5%. How does this sample size compare with your answer from 
Exercise 43 part (b)? 


M@ SECTION 6.4 


In Exercises 45-48, find the critical values x and Xe; for the given confidence level 
cand sample size n. 

45. c = 0.95,n = 13 46. c = 0.98, n = 25 

47. c = 0.90,n = 8 48. c = 0.99,n = 10 

In Exercises 49-52, construct the indicated confidence intervals for the population 


variance o* and the population standard deviation o. Assume each sample is 
taken from a normally distributed population. 


49, Arandom sample of the weights (in ounces) of 17 superzoom digital cameras 
is shown in the stem-and-leaf plot. Use a 95% level of confidence. (Adapted 
from Consumer Reports) 


0|7889 Key: 1|3 = 13 
1| 0134555779 

2/14 

3|5 


50. Repeat Exercise 49 using a 99% level of confidence. Interpret the results and 
compare with Exercise 49. 


© 51. A random sample of the acceleration times (in seconds) from 0 to 60 
miles per hour for 26 sedans is shown in the dot plot. Use a 98% level 
of confidence. (Adapted from Consumer Reports) 


Acceleration Times From 0-60 
Miles Per Hour for Sedans 


Time (in seconds) 


52. Repeat Exercise 51 using a 90% level of confidence. Interpret the results and 
compare with Exercise 51. 
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PD cuapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


© 1. The following data set represents the amounts of time (in minutes) spent 
watching online videos each day for a random sample of 30 college 
students. (Adapted from the Council for Research Excellence) 


5.0 6.25 8.0 5.5 4.75 45 7.2 66 5.8 5.5 
42 54 675 98 82 64 78 65 55 6.0 
3.8 6.75 9.25 100 96 7.2 64 68 98 10.2 


(a) Find the point estimate of the population mean. 


(b) Find the margin of error for a 95% level of confidence. Interpret the 
result. 


(c) Construct a 95% confidence interval for the population mean. 
Interpret the results. 


2. You want to estimate the mean time college students spend watching online 
videos each day. The estimate must be within 1 minute of the population 
mean. Determine the required sample size to construct a 99% confidence 
interval for the population mean. Assume the population standard deviation 
is 2.4 minutes. 


3. The following data set represents the average number of minutes played for a 
random sample of professional basketball players in a recent season. (Source: 
ESPN) 


35.9 33.8 34.7 31.5 33.2 29.1 30.7 31.2 36.1 34.9 


(a) Find the sample mean and the sample standard deviation. 


(b) Construct a 90% confidence interval for the population mean and 
interpret the results. Assume the population of the data set is normally 
distributed. 


(c) Repeat part (b), assuming o = 5.25 minutes per game. Interpret and 
compare the results. 


4. In a random sample of seven aerospace engineers, the mean monthly income 
was $6824 and the standard deviation was $340. Assume the monthly incomes 
are normally distributed and construct a 95% confidence interval for the 
population mean monthly income for aerospace engineers. (Adapted from U.S. 
Bureau of Labor Statistics) 


5. In a survey of 1383 US. adults, 1079 favor increasing federal funding for 
research on wind, solar, and hydrogen technology. (Adapted from Pew Research 
Center) 


(a) Find a point estimate for the population proportion p of those in favor 
of increasing federal funding for research on wind, solar, and hydrogen 
technology. 


(b) Construct a 90% confidence interval for the population proportion. 


(c) Find the minimum sample size needed to estimate the population 
proportion at the 99% confidence level in order to ensure that the 
estimate is accurate within 4% of the population proportion. 


6. Refer to the data set in Exercise 1. Assume the population of times spent 
watching online videos each day is normally distributed. 


(a) Construct a 95% confidence interval for the population variance. 
b) Construct a 95% confidence interval for the population standard deviation. 
pop 
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= oS . . - 8 

> ~~ Real Statistics — Real Decisions 
In 1974, the Safe Drinking Water Act was passed “to protect public health st€P S747, 

: fg ee Eee % : S x) 

by regulating the nation’s public drinking water supply.” In accordance with eo a 
the act, the Environmental Protection Agency (EPA) has regulations that 2 So z 
limit the levels of contaminants in drinking water supplied by water 3 SZ g 
utilities. These utilities are required to supply water quality reports to their % ¢ 


customers annually. These reports discuss the source of the water, its Tay prow’ 
treatment, and the results of water quality monitoring that is performed 


daily. The results of this monitoring indicate whether or not drinking water Cyanide 


is healthy enough for consumption. eel 
A water department tests for contaminants at water treatment oe We - 
plants and at customers’ taps. These regulated parameters include = s ae a 
microorganisms, organic chemicals, and inorganic chemicals. For SE 008+ 
instance, cyanide is an inorganic chemical that is regulated. Its presence z by ee a 
in drinking water is the result of discharges from steel, plastics, and gE 0.05 + u 
fertilizer factories. The maximum contaminant level for cyanide is set at = aa i + 
0.2 part per million. gt a ’ 
You work for a city’s water department and are interpreting the results = 


shown in the graph at the right. The graph shows the point estimates for the MEN NG NES 


population mean concentration and the 95% confidence intervals for w 
for cyanide over a three-year period. The data are based on random water 
samples taken by the city’s three water treatment plants. 


Year 


1. Interpreting the Results 
Use the graph to decide if there has been a change in the mean 
concentration level of cyanide for the given years. Explain your reasoning. 
(a) From Year 1 to Year 2 (b) From Year 2 to Year 3 
(c) From Year 1 to Year 3 


2. What Can You Conclude? 


Using the results of Exercise 1, what can you conclude about the 
concentrations of cyanide in the drinking water? 


3. What Do You Think? 


The confidence interval for Year 2 is much larger than the other years. 
What do you think may have caused this larger confidence level? 


4. How Do You Think They Did It? 


How do you think the water department constructed the 95% 
confidence intervals for the population mean concentration of 
cyanide in the water? Do the following to answer the question. (You 
do not need to make any calculations.) 


(a) What sampling distribution do you think they used? Why? 


(b) Do you think they used the population standard deviation in 
calculating the margin of error? Why or why not? If not, what 
could they have used? 
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MINITAB 


TECHNOLOGY 


THE GALLUP ORGANIZATION 


WWW.GALLUP.COM 


MOST ADMIRED POLLS 


Since 1946, the Gallup Organization has conducted 
a “most admired” poll. The methodology for the 


2009 poll is described at the right. 


Survey Question 
What man* that you have heard or read 
about, living today in any part of the 
world, do you admire most? And who is 
your second choice? 


Reprinted with permission from GALLUP. 


*Survey respondents are asked an identical 


question about most admired woman. 


M@ EXERCISES 


“Results are based on telephone interviews 
with 1,025 national adults, aged 18 and older, 
conducted Dec. 11-13, 2009. For results based 
on the total sample of national adults, one can 
say with 95% confidence that the maximum 
margin of sampling error is +4 percentage points. 
Interviews are conducted with respondents on 
land-line telephones (for respondents with 
a land-line telephone) and cellular phones 
(for respondents who are cell-phone only). In 
addition to sampling error, question wording 
and practical difficulties in conducting surveys 
can introduce error or bias into the findings of 
public opinion polls.” 


1. In 2009, the most named man was Barack 5. Use a technology tool to simulate a most 
Obama at 30%. Use a technology tool to find admired poll. Assume that the actual 
a 95% confidence interval for the population population proportion who most admire Sarah 
proportion that would have chosen Barack Palin is 18%. Run the simulation several times 
Obama. using n = 1025. 

2. In 2009, the most named woman was Hillary (a) What was the least value you obtained for 
Clinton at 16%. Use a technology tool to find p? 
a 95% confidence interval for the population (b) What was the greatest value you obtained 
proportion that would have chosen Hillary for p? 


Clinton. 


. Do the confidence intervals you obtained in 


Exercises 1 and 2 agree with the statement 
issued by the Gallup Organization that the 
margin of error is +4%? Explain. 


. The second most named woman was Sarah 


Palin, who was named by 15% of the people in 
the sample. Use a technology tool to find a 95% 
confidence interval for the population propor- 
tion that would have chosen Sarah Palin. 


MINITAB 


Number of rows of data to generate: 200 
Store in column{s): C1 
Number of trials: 1025 


Event probability: 0.18 


6. Is it probable that the population proportion 


who most admire Sarah Palin is 18% or 
greater? Explain your reasoning. 


351 


TI-83/84 PLUS 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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USING TECHNOLOGY TO CONSTRUCT 
CONFIDENCE INTERVALS 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


41-Sample t... 
2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 


2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


2 Proportions... 


Here are some MINITAB and TI-83/84 Plus printouts for some examples 
in this chapter. Answers may be slightly different because of rounding. 


(See Example 3, page 307.) 


140 105 130 97 80 165 232 110 214 201 122 98 65 88 
154 133 121 82 130 211 153 114 58 77 51 247 236 109 
126 132 125 149 122 74 59 218 192 90 117 105 


MINITAB 


One-Sample Z: Friends 


The assumed standard deviation = 53 


Variable N Mean StDev SE Mean 95% Cl 
Friends 40 130.80 52.63 8.38 (114.38, 147.22) 


(See Example 2, page 320.) 


MINITAB 


One-Sample T 
N Mean StDev SE Mean 95% Cl 
16 162.00 10.00 2.50 (156.67, 167.33) 


(See Example 2, page 329.) 


MINITAB 


Test and Cl for One Proportion 


Sample x N Sample p 95% Cl 
4 662 1000 0662000 £(0.631738, 0.691305) 
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TI-83/84 PLUS 


EDIT CALC 
A Seste 
: T-Test... 
: 2-SampZTest... 
: 2-SamptT Test... 
: 1-PropZTest... 

: 2-PropZTest... 


7 Zinterval... 


qQoukRwWM= 


TI-83/84 PLUS 


Zinterval 
Inpt: Data 
ee 416) 
x: 22.9 
n: 20 
C-Level: 9 
Calculate 


TI-83/84 PLUS 


Zinterval 
(22.348, 23.452) 
x= 229 
n= 20 


| 
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(See Example 3, page 321.) 


TI-83/84 PLUS 


EDIT CALC §saS 


2 T-Test... 

: 2-SampZTest... 

: 2-SamptT Test... 

: 1-PropZTest... 

: 2-PropZTest... 
: Zinterval... 


EY Tinterval... 


N Oo 8 WO 


TI-83/84 PLUS 


Tinterval 
Inpt: Data 
xe SVS 
se aes) 
n: 20 
C-Level: .99 
Calculate 


TI-83/84 PLUS 


Tinterval 
(2211, W273) 
x= 9.75 
Sx= 2.39 
n= 20 
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(See Example 2, page 329.) 


TI-83/84 PLUS 


EDIT CALC 
5 1-PropZTest... 
6: 2-PropZfTest... 
7: Zinterval... 

8: TInterval... 

9: 2-Samp2Zint... 
O: 2-SampTint... 
41 -PropZiInt... 


TI-83/84 PLUS 


1 -PropZInt 
x: 662 
n: 1000 
C-Level: .95 
Calculate 


TI-83/84 PLUS 


1 -PropZInt 
(.63268, 69132) 
p= 0.662 
n= 1000 
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HYPOTHESIS 
TESTING WITH 
ONE SAMPLE 


Introduction to 
Hypothesis Testing 
Hypothesis Testing for 
the Mean (Large Samples) 


m@ CASE STUDY 


Hypothesis Testing 
for the Mean 
(Small Samples) 

@ ACTIVITY 
Hypothesis Testing 
for Proportions 

@ ACTIVITY 
Hypothesis Testing 
for Variance and 
Standard Deviation 
m@ USES AND ABUSES 


® REAL STATISTICS- y 
REAL DECISIONS ee Ame Zee Maen Se foes Becca (akc ek Vaca Vee a sash ed eN 


@ TECHNOLOGY 


ns 


Computer software is protected by federal 
copyright laws. Each year, software companies 
lose billions of dollars because of pirated 
software. Federal criminal penalties for 
software piracy can include fines of up to 
$250,000 and jail terms of up to five years. 
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«€ WHERE YOU'VE BEEN 


In Chapter 6, you began your study of inferential 
statistics. There, you learned how to form a 
confidence interval estimate about a population 
parameter, such as the proportion of people in 
the United States who agree with a certain 


statement. For instance, in a nationwide poll 


conducted by Harris Interactive on behalf of the 
Business Software Alliance (BSA), U.S. students 
ages 8 to 18 years were asked several questions 
about their attitudes toward copyright law and 
Internet behavior. Here are some of the results. 


Survey Question Number Surveyed Number Who Said Yes 
Have you ever downloaded music from 1196 361 

the Internet without paying for it? 

Have you ever downloaded movies from 1196 95 

the Internet without paying for them? 

Have you ever downloaded software from 1196 133 


the Internet without paying for it? 


WHERE YOU’RE GOING p> 


In this chapter, you will continue your study of 
inferential statistics. But now, instead of making 
an estimate about a population parameter, you 
will learn how to test a claim about a parameter. 


For instance, suppose that you work for Harris 
Interactive and are asked to test a claim that the 
proportion of U.S. students ages 8 to 18 who 
download music without paying for it is 
p = 0.25. To test the claim, you take a random 
sample of n = 1196 students and find that 361 of 
them download music without paying for it. Your 
sample statistic is p ~ 0.302. 


Claim p = 0.25 


Is your sample statistic different enough from the 
claim (p = 0.25) to decide that the claim is false? 
The answer lies in the sampling distribution of 
sample proportions taken from a population in 
which p = 0.25. The graph below shows that 
your sample statistic is more than 4 standard 
errors from the claimed value. If the claim is 
true, the probability of the sample statistic being 
4 standard errors or more from the claimed 
value is extremely small. Something is wrong! If 
your sample was truly random, then you can 
conclude that the actual proportion of the 
student population is not 0.25. In other words, 
you tested the original claim (hypothesis), and 
you decided to reject it. 


Sample statistic 
p = 0.302 


if 
T 
0.27 0.29 0.31 0.33 


t +e t [——> Z 
1 2 3 4 5) 6 
Standardized z-value 
g = ALIS) 


Sampling Distribution 
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7.1 


CHAPTER 7 


WHAT YOU SHOULD LEARN 


» A practical introduction to 
hypothesis tests 


vv 


How to state a null hypothesis 
and an alternative hypothesis 


Gh 


How to identify type | and 
type Il errors and interpret 
the level of significance 


vv 


How to know whether to use 
a one-tailed or two-tailed 
statistical test and find a 
P-value 


Vv 


How to make and interpret a 
decision based on the results 
of a statistical test 


Vv 


How to write a claim for a 
hypothesis test 


INSIGHT 


As you study this chapter, don’t 
get confused regarding concepts 
of certainty and importance. 

For instance, even if you 
were very certain that the 
mean gas mileage of a type 
of hybrid vehicle is not 
50 miles per gallon, the 
actual mean mileage 
might be very close to 
this value and the 
difference might not be 
important. 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


Introduction to Hypothesis Testing 


Hypothesis Tests > Stating a Hypothesis >» Types of Errors and Level of 
Significance > Statistical Tests and P-Values >» Making a Decision and 
Interpreting the Decision > Strategies for Hypothesis Testing 


>» HYPOTHESIS TESTS 


Throughout the remainder of this course, you will study an important technique 
in inferential statistics called hypothesis testing. A hypothesis test is a process 
that uses sample statistics to test a claim about the value of a population 
parameter. Researchers in fields such as medicine, psychology, and business rely 
on hypothesis testing to make informed decisions about new medicines, 
treatments, and marketing strategies. 

For instance, suppose an automo- 
bile manufacturer advertises that its 
new hybrid car has a mean gas 
mileage of 50 miles per gallon. If you 
suspect that the mean mileage is not 
50 miles per gallon, how could you 
show that the advertisement is false? 

Obviously, you cannot test all the 
vehicles, but you can still make a 
reasonable decision about the mean gas mileage by taking a random sample 
from the population of vehicles and measuring the mileage of each. If the sample 
mean differs enough from the advertisement’s mean, you can decide that the 
advertisement is wrong. 

For instance, to test that the mean gas mileage of all hybrid vehicles of this 
type is w = 50 miles per gallon, you could take a random sample of n = 30 
vehicles and measure the mileage of each. Suppose you obtain a sample mean of 
x = 47 miles per gallon with a sample standard deviation of s = 5.5 miles per 
gallon. Does this indicate that the manufacturer’s advertisement is false? 

To decide, you do something unusual—you assume the advertisement is 
correct! That is, you assume that w = 50. Then, you examine the sampling 
distribution of sample means (with n = 30) taken from a population in which 
mw = 50 and o = 5.5. From the Central Limit Theorem, you know this sampling 
distribution is normal with a mean of 50 and standard error of 


53... 
V30 


In the graph at the right, notice 
that your sample mean of x = 47 
miles per gallon is highly unlikely— 
it is about 3 standard errors from 


Sampling Distribution of x 


Hypothesized mean 
Sample mean u=50 


x=47 


‘ ‘ x 
the claimed mean! Using the tech- 46 47 48 49 50 51 52 53 54 
niques you studied in Chapter 5, 4 4 jg 
you can determine that if the -4 -3\ -2 -1 0 2 3 4 


advertisement is true, the probability Standardized z-value 

of obtaining a sample mean of 47 or *> i 

less is about 0.0013. This is an unusual event! Your assumption that the company’s 
advertisement is correct has led you to an improbable result. So, either you had 
a very unusual sample, or the advertisement is probably false. The logical 
conclusion is that the advertisement is probably false. 
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> STATING A HYPOTHESIS 


A statement about a population parameter is called a statistical hypothesis. To 
test a population parameter, you should carefully state a pair of hypotheses—one 
that represents the claim and the other, its complement. When one of 
these hypotheses is false, the other must be true. Either hypothesis—the null 
hypothesis or the alternative hypothesis—may represent the original claim. 


The term null hypothesis 1. A null hypothesis HM is a statistical hypothesis that contains a statement of 
was introduced by equality, such as =, =, or =. 

Ronald Fisher ' 2. The alternative hypothesis H, is the complement of the null hypothesis. 
(see page 33). , It is a statement that must be true if Hp is false and it contains a statement 
If the statement in of strict inequality, such as >, #, or <. 


the null hypothesis is 
not true, then the 
alternative hypothesis 
must be true. 


Hy is read as “H sub-zero” or “H naught” and H, is read as “H sub-a.” 


To write the null and alternative hypotheses, translate the claim made about 
the population parameter from a verbal statement to a mathematical statement. 
Then, write its complement. For instance, if the claim value is k and the 
population parameter is ww, then some possible pairs of null and alternative 
hypotheses are 


Ay wsk Ay pwp=k Ay w=k 
Hup>k Hyuw<k Hyipwetk 
A sample of 25 randomly selected 
patients with early-stage high Regardless of which of the three pairs of hypotheses you use, you always 
blood pressure underwent a assume « = k and examine the sampling distribution on the basis of this 
special chiropractic adjustment to assumption. Within this sampling distribution, you will determine whether or not 
help lower their blood pressure. a sample statistic is unusual. 
After eight weeks, the mean drop The following table shows the relationship between possible verbal 


in the patients’ systolic blood 
pressure was 14 millimeters of 
mercury. So, it is claimed that 
the mean drop in systolic blood 
pressure of all patients who 
undergo this special chiropractic 
adjustment is 14 millimeters of 
mercury. (Adapted from Journal of Human 
Hypertension) 


statements about the parameter mw and the corresponding null and alternative 
hypotheses. Similar statements can be made to test other population parameters, 
such as p, a, or o. 


Determine a null hypothesis 


and alternative hypothesis for ... greater than or equal to k. ean ... less than k. 
° ° ow= k 
this claim. ... at least k. ... below k. 
.. not less than k. Ag wok. . fewer than k. 
.. less than or equal to k. Se .. greater than k. 
..at most k. ee pee ... above k. 
.. not more than k. Ay b@>k | || more thank. 
... equal to k. oe ... not equal to k. 
Lk. ie GH Kh | different from k 
.. exactly k. Ag hF*k | notk. 
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H, Hy 
cr i ‘ 
_>\ rans 4 a 
11 12 13 14 15 16 17 18 19 
Hy ii, 
fi a Be 8 a a 


SS ere 
14 15 16 17 18 19 20 21 22 


In each of these graphs, notice that 


each point on the number line is in 
Ho or H,, but no point is in both. 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


EXAMPLE 1 


» Stating the Null and Alternative Hypotheses 


Write the claim as a mathematical sentence. State the null and alternative 
hypotheses, and identify which represents the claim. 


1. 


A school publicizes that the proportion of its students who are involved in 
at least one extracurricular activity is 61%. 


. Acar dealership announces that the mean time for an oil change is less than 


15 minutes. 


. A company advertises that the mean life of its furnaces is more than 


18 years. 


> Solution 


1. 


The claim “the proportion...is 61%” can be written as p = 0.61. Its 
complement is p # 0.61. Because p = 0.61 contains the statement of 
equality, it becomes the null hypothesis. In this case, the null hypothesis 
represents the claim. 


Ho: p = 0.61 (Claim) 
H,: p # 0.61 


. The claim “the mean... is less than 15 minutes” can be written as w < 15. 


Its complement is w = 15. Because = 15 contains the statement of 
equality, it becomes the null hypothesis. In this case, the alternative 
hypothesis represents the claim. 


Ho: w» = 15 minutes 


A: w < 15 minutes (Claim) 


. The claim “the mean ...is more than 18 years” can be written as wu > 18. 


Its complement is w = 18. Because =< 18 contains the statement of 
equality, it becomes the null hypothesis. In this case, the alternative 
hypothesis represents the claim. 


Hy: w = 18 years 
Ay: w > 18 years (Claim) 


> Try It Yourself 1 


Write the claim as a mathematical sentence. State the null and alternative 
hypotheses, and identify which represents the claim. 


1. 


2. 


3. 


a 2 


A consumer analyst reports that the mean life of a certain type of 
automobile battery is not 74 months. 

An electronics manufacturer publishes that the variance of the life of its 
home theater systems is less than or equal to 2.7. 

A realtor publicizes that the proportion of homeowners who feel their 
house is too small for their family is more than 24%. 


. Identify the verbal claim and write it as a mathematical statement. 
. Write the complement of the claim. 


Identify the null and alternative hypotheses and determine which one 
represents the claim. Answer: Page A40 
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>» TYPES OF ERRORS AND LEVEL OF SIGNIFICANCE 


No matter which hypothesis represents the claim, you always begin a hypothesis 
test by assuming that the equality condition in the null hypothesis is true. So, 
when you perform a hypothesis test, you make one of two decisions: 


1. reject the null hypothesis or 
2. fail to reject the null hypothesis. 


Because your decision is based on a sample rather than the entire population, 
there is always the possibility you will make the wrong decision. 

For instance, suppose you claim that a certain coin is not fair. To test your 
claim, you flip the coin 100 times and get 49 heads and 51 tails. You would 
probably agree that you do not have enough evidence to support your claim. 
Even so, it is possible that the coin is actually not fair and you had an unusual 
sample. 

But what if you flip the coin 100 times and get 21 heads and 79 tails? It would 
be a rare occurrence to get only 21 heads out of 100 tosses with a fair coin. So, 
you probably have enough evidence to support your claim that the coin is not 
fair. However, you can’t be 100% sure. It is possible that the coin is fair and you 
had an unusual sample. 

If p represents the proportion of heads, the claim that “the coin is not fair” 
can be written as the mathematical statement p # 0.5. Its complement, “the 
coin is fair,” is written as p = 0.5. So, your null hypothesis and alternative 
hypothesis are 


Ho: Pe 0.5 
and 
HH, p # 0.5. (Claim) 


Remember, the only way to be absolutely certain of whether Ho is true or 
false is to test the entire population. Because your decision—to reject Hy or to fail 
to reject Hjp—is based on a sample, you must accept the fact that your decision 
might be incorrect. You might reject a null hypothesis when it is actually true. Or, 
you might fail to reject a null hypothesis when it is actually false. 


DEFINITION 


A type I error occurs if the null hypothesis is rejected when it is true. 


A type II error occurs if the null hypothesis is not rejected when it is false. 


The following table shows the four possible outcomes of a hypothesis test. 


“Decision Hi, is true. Ay is false. 


Do not reject Mp. | Correct decision Type II error 
Reject Hp. Type I error Correct decision 
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Not guilty | Justice 


Guilty 


Type I error 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


Guilty 


Type II 
error 


Justice 


Hypothesis testing is sometimes compared to the legal system used in the 


United States. Under this system, the following steps are used. 


1. A carefully worded accusation is written. 


2. The defendant is assumed innocent (H,) until proven guilty. The burden of 
proof lies with the prosecution. If the evidence is not strong enough, there is 
no conviction. A “not guilty” verdict does not prove that a defendant is 
innocent. 


3. The evidence needs to be conclusive beyond a reasonable doubt. The system 
assumes that more harm is done by convicting the innocent (type I error) 
than by not convicting the guilty (type II error). 


EXAMPLE 2 


> Identifying Type | and Type II Errors 

The USDA limit for salmonella contamination for chicken is 20%. A meat 
inspector reports that the chicken produced by a company exceeds the USDA 
limit. You perform a hypothesis test to determine whether the meat inspector’s 
claim is true. When will a type I or type II error occur? Which is more serious? 
(Source: U.S. Department of Agriculture) 


> Solution 


Let p represent the proportion of the chicken that is contaminated. The meat 
inspector’s claim is “more than 20% is contaminated.” You can write the null 
and alternative hypotheses as follows. 


Ho: p = 0.2 The proportion is less than or equal to 20%. 
A: p > 0.2 (Claim) The proportion is greater than 20%. 


Chicken meets Chicken exceeds 


USDA limits. USDA limits. 
Hy: p $0.2 HH: p >0.2 


A type I error will occur if the actual proportion of contaminated chicken is 
less than or equal to 0.2, but you reject Hp. A type II error will occur if the 
actual proportion of contaminated chicken is greater than 0.2, but you do not 
reject Hy. With a type I error, you might create a health scare and hurt 
the sales of chicken producers who were actually meeting the USDA limits. 
With a type IJ error, you could be allowing chicken that exceeded the USDA 
contamination limit to be sold to consumers. A type I error is more serious 
because it could result in sickness or even death. 


> Try It Yourself 2 


A company specializing in parachute assembly states that its main parachute 
failure rate is not more than 1%. You perform a hypothesis test to determine 
whether the company’s claim is false. When will a type I or type II error occur? 
Which is more serious? 


a. State the null and alternative hypotheses. 
b. Write the possible type J and type IT errors. 
c. Determine which error is more serious. Answer: Page A40 
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INSIGHT 


When you decrease a (the 
maximum allowable probability 
of making a type | error), you 
are likely to be increasing B. 
The value 1 — B is called 

the power of the test. 

It represents the 

probability of rejecting 

the null hypothesis 

when it is false. The 

value of the power is 
difficult (and sometimes 
impossible) to find in 

most cases. 
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You will reject the null hypothesis when the sample statistic from the 
sampling distribution is unusual. You have already identified unusual events to 
be those that occur with a probability of 0.05 or less. When statistical tests are 
used, an unusual event is sometimes required to have a probability of 0.10 or less, 
0.05 or less, or 0.01 or less. Because there is variation from sample to sample, 
there is always a possibility that you will reject a null hypothesis when it is actually 
true. In other words, although the null hypothesis is true, your sample statistic is 
determined to be an unusual event in the sampling distribution. You can decrease 
the probability of this happening by lowering the /evel of significance. 


DEFINITION 


In a hypothesis test, the level of significance is your maximum allowable 
probability of making a type I error. It is denoted by a, the lowercase Greek 
letter alpha. 


The probability of a type II error is denoted by B, the lowercase Greek 
letter beta. 


By setting the level of significance at a small value, you are saying that you 
want the probability of rejecting a true null hypothesis to be small. Three 
commonly used levels of significance are a = 0.10, a = 0.05, and a = 0.01. 


>» STATISTICAL TESTS AND P-VALUES 


After stating the null and alternative hypotheses and specifying the level of 
significance, the next step in a hypothesis test is to obtain a random sample from 
the population and calculate sample statistics such as the mean and the standard 
deviation. The statistic that is compared with the parameter in the null hypothesis 
is called the test statistic. The type of test used and the sampling distribution are 
based on the test statistic. 

In this chapter, you will learn about several one-sample statistical tests. The 
following table shows the relationships between population parameters and their 
corresponding test statistics and standardized test statistics. 


L x z (Section 7.2,n = 30), 
t (Section 7.3, n < 30) 

P Pp z (Section 7.4) 

o s? x (Section 7.5) 


One way to decide whether to reject the null hypothesis is to determine 
whether the probability of obtaining the standardized test statistic (or one that is 
more extreme) is less than the level of significance. 


DEFINITION 


If the null hypothesis is true, a P-value (or probability value) of a hypothesis 
test is the probability of obtaining a sample statistic with a value as extreme or 
more extreme than the one determined from the sample data. 
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STUDY TIP 
The third type of test is called 
a two-tailed test because 
evidence that would 
support the alternative : 
hypothesis could lie in i 
either tail of the m 
sampling distribution. 


The P-value of a hypothesis test depends on the nature of the test. There 
are three types of hypothesis tests—left-tailed, right-tailed, and two-tailed. The 
type of test depends on the location of the region of the sampling distribution 
that favors a rejection of Hy. This region is indicated by the alternative 
hypothesis. 


DEFINITION 


1. If the alternative hypothesis H, contains the less-than inequality symbol 
(<), the hypothesis test is a left-tailed test. 


P is the area to 
the left of the 
standardized 
test statistic. 


Ho: wz k 
lal 3 (ele 


= ae 
-3 Sy i 0 1 2 3 
Standardized test statistic 


Left-Tailed Test 


2. If the alternative hypothesis H, contains the greater-than inequality 
symbol (>), the hypothesis test is a right-tailed test. 


P is the area to 
the right of the 
standardized 

test statistic. 


Ho: WS k 
Hy: b>k 


| 
we 
i 
i) 
| 
= 
@-5 
iS 
i) 
we 


Standardized test statistic 


Right-Tailed Test 


3. If the alternative hypothesis H, contains the not-equal-to symbol (#), the 
hypothesis test is a two-tailed test. In a two-tailed test, each tail has an 


area of 4P. 

P is twice the P is twice the 
Ay: Mak area to the left area to the right 
Ay: Mk of the negative of the positive 


standardized standardized 


test statistic. test statistic. 


Standardized test statistic Standardized test statistic 


Two-Tailed Test 


The smaller the P-value of the test, the more evidence there is to reject the 
null hypothesis. A very small P-value indicates an unusual event. Remember, 
however, that even a very low P-value does not constitute proof that the null 
hypothesis is false, only that it is probably false. 
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EXAMPLE 3 


> Identifying the Nature of a Hypothesis Test 

For each claim, state Hy) and H, in words and in symbols. Then determine 
whether the hypothesis test is a left-tailed test, right-tailed test, or two-tailed 
test. Sketch a normal sampling distribution and shade the area for the P-value. 


1. A school publicizes that the proportion of its students who are involved 
in at least one extracurricular activity is 61%. 


2. A car dealership announces that the mean time for an oil change is less 
than 15 minutes. 


3. A company advertises that the mean life of its furnaces is more than 
18 years. 


> Solution 
In Symbols In Words 
1. Ho: p = 0.61 The proportion of students who are involved in at 
least one extracurricular activity is 61%. 
A, p # 0.61 The proportion of students who are involved in at 


least one extracurricular activity is not 61%. 
Because H, contains the # symbol, the test is a two-tailed hypothesis test. 
The graph of the normal sampling distribution at the left shows the shaded 
area for the P-value. 


2. Ho: w= 15 min The mean time for an oil change is greater than or 
equal to 15 minutes. 
Ay: w< 15 min The mean time for an oil change is less than 
15 minutes. 


Because H, contains the < symbol, the test is a left-tailed hypothesis test. 
The graph of the normal sampling distribution at the left shows the shaded 
area for the P-value. 


3. Ho: w = 18 yr The mean life of the furnaces is less than or equal to 
18 years. 
Ay: w> 18 yr The mean life of the furnaces is more than 18 years. 


Because H, contains the > symbol, the test is a right-tailed hypothesis test. 
The graph of the normal sampling distribution at the left shows the shaded 
area for the P-value. 


> Try It Yourself 3 


For each claim, state Hj) and H, in words and in symbols. Then determine 
whether the hypothesis test is a left-tailed test, right-tailed test, or two-tailed 
test. Sketch a normal sampling distribution and shade the area for the P-value. 


1. A consumer analyst reports that the mean life of a certain type of automobile 
battery is not 74 months. 

2. An electronics manufacturer publishes that the variance of the life of its 
home theater systems is less than or equal to 2.7. 

3. A realtor publicizes that the proportion of homeowners who feel their 
house is too small for their family is more than 24%. 


. Write Hy and H, in words and in symbols. 

. Determine whether the test is left-tailed, right-tailed, or two-tailed. 

c. Sketch the sampling distribution and shade the area for the P-value. 
Answer: Page A40 


a 2 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


364 CHAPTER 7 HYPOTHESIS TESTING WITH ONE SAMPLE 


INSIGHT 


In this chapter, you will learn 
that there are two basic types 
of decision rules for deciding 
whether to reject Hp or fail to 
reject Ho. The decision rule 
described on this page is based 
on P-values. The second basic 
type of decision rule is based 
on rejection regions. When 
the standardized test 
statistic falls in the 
rejection region, the 
observed probability 
(P-value) of a type | 
error is less than a. You 
will learn more about 
rejection regions in the 
next section. 


> MAKING A DECISION AND INTERPRETING THE 
DECISION 


To conclude a hypothesis test, you make a decision and interpret that decision. 
There are only two possible outcomes to a hypothesis test: (1) reject the null 
hypothesis and (2) fail to reject the null hypothesis. 


DECISION RULE BASED ON P-VALUE 


To use a P-value to make a conclusion in a hypothesis test, compare the 
P-value with a. 

1. If P = a, then reject Ho. 

2. If P > a, then fail to reject Ho. 


Failing to reject the null hypothesis does not mean that you have accepted the 
null hypothesis as true. It simply means that there is not enough evidence to reject 
the null hypothesis. If you want to support a claim, state it so that it becomes the 
alternative hypothesis. If you want to reject a claim, state it so that it becomes the 
null hypothesis. The following table will help you interpret your decision. 


“Decision Claim is Hy. Claim is H,. 


Reject Ap. There is enough evidence There is enough evidence 
to reject the claim. to support the claim. 
Fail to reject Hp. There is not enough There is not enough 


evidence to reject the claim. | evidence to support the claim. 


EXAMPLE 4 


> Interpreting a Decision 


You perform a hypothesis test for each of the following claims. How should 
you interpret your decision if you reject Hy? If you fail to reject Ho? 


1. Ho (Claim): A school publicizes that the proportion of its students who are 
involved in at least one extracurricular activity is 61%. 


2. H, (Claim): A car dealership announces that the mean time for an oil change 
is less than 15 minutes. 


> Solution 


1. The claim is represented by Hp. If you reject Ho, then you should 
conclude “there is enough evidence to reject the school’s claim that the 
proportion of students who are involved in at least one extracurricular 
activity is 61%.” If you fail to reject Ho, then you should conclude “there is 
not enough evidence to reject the school’s claim that the proportion of 
students who are involved in at least one extracurricular activity is 61%.” 


2. The claim is represented by H,, so the null hypothesis is “the mean time for 
an oil change is greater than or equal to 15 minutes.” If you reject Ho, then 
you should conclude “there is enough evidence to support the dealership’s 
claim that the mean time for an oil change is less than 15 minutes.” If 
you fail to reject Ho, then you should conclude “there is not enough 
evidence to support the dealership’s claim that the mean time for an oil 
change is less than 15 minutes.” 
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STUDY TIP 


When performing a hypothesis 
test, you should always state 

the null and alternative hypotheses 
before collecting data. 

You should not collect 

the data first and then 

create a hypothesis 

based on something 

unusual in the data. 
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> Try It Yourself 4 


You perform a hypothesis test for the following claim. How should you 
interpret your decision if you reject H? If you fail to reject H)? 


H, (Claim): A realtor publicizes that the proportion of homeowners who 
feel their house is too small for their family is more than 24%. 


a. Interpret your decision if you reject the null hypothesis. 
b. Interpret your decision if you fail to reject the null hypothesis. 
Answer: Page A41 


The general steps for a hypothesis test using P-values are summarized 
below. 


STEPS FOR HYPOTHESIS TESTING 


1. State the claim mathematically and verbally. Identify the null and alternative 
hypotheses. 


Hy H; Ee 
2. Specify the level of significance. 


a= ? 


This sampling distribution 
is based on the assumption 
that Ho is true. 


3. Determine the standardized 
sampling distribution and sketch 
its graph. 


0 


4. Calculate the test statistic and its 
corresponding standardized test 
statistic. Add it to your sketch. i 


Standardized test statistic 


5. Find the P-value. 
6. Use the following decision rule. 


Is the P-value less than or 
equal to the level of 
significance? 


Fail to reject Hp. 


Reject Hp. 


7. Write a statement to interpret the decision in the context of the original 
claim. 


In the steps above, the graphs show a right-tailed test. However, the same 
basic steps also apply to left-tailed and two-tailed tests. 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


>» STRATEGIES FOR HYPOTHESIS TESTING 


In a courtroom, the strategy used by an attorney depends on whether the 
attorney is representing the defense or the prosecution. In a similar way, the 
strategy that you will use in hypothesis testing should depend on whether you are 
trying to support or reject a clam. Remember that you cannot use a hypothesis 
test to support your claim if your claim is the null hypothesis. So, as a researcher, 
if you want a conclusion that supports your claim, word your claim so it is the 
alternative hypothesis. If you want to reject a claim, word it so it is the null 
hypothesis. 


EXAMPLE 5 


> Writing the Hypotheses 


A medical research team is investigating the benefits of a new surgical 
treatment. One of the claims is that the mean recovery time for patients after 
the new treatment is less than 96 hours. How would you write the null and 
alternative hypotheses if (1) you are on the research team and want to support 
the claim? (2) you are on an opposing team and want to reject the claim? 


> Solution 


1. 


To answer the question, first think about the context of the claim. Because 
you want to support this claim, make the alternative hypothesis state 
that the mean recovery time for patients is less than 96 hours. So, 
A,: ~ < 96 hours. Its complement, 4 = 96 hours, would be the null 
hypothesis. 


Ho: feo = 96 
A: w < 96 (Claim) 


. First think about the context of the claim. As an opposing researcher, you 


do not want the recovery time to be less than 96 hours. Because you want 
to reject this claim, make it the null hypothesis. So, Ho: ~ = 96 hours. Its 
complement, x > 96 hours, would be the alternative hypothesis. 

Ao: w = 96 (Claim) 

A,: w > 9 


> Try It Yourself 5 


1. 


You represent a chemical company that is being sued for paint damage to 
automobiles. You want to support the claim that the mean repair cost per 
automobile is less than $650. How would you write the null and alternative 
hypotheses? 


. You are on a research team that is investigating the mean temperature of 


adult humans. The commonly accepted claim is that the mean temperature 
is about 98.6°F. You want to show that this claim is false. How would you 
write the null and alternative hypotheses? 


. Determine whether you want to support or reject the claim. 
. Write the null and alternative hypotheses. Answer: Page A41 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What are the two types of hypotheses used in a hypothesis test? How are 
they related? 


2. Describe the two types of error possible in a hypothesis test decision. 


3. What are the two decisions that you can make from performing a hypothesis 
test? 


4. Does failing to reject the null hypothesis mean that the null hypothesis is 
true? Explain. 


True or False? In Exercises 5-10, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. In a hypothesis test, you assume the alternative hypothesis is true. 
6. A statistical hypothesis is a statement about a sample. 


7. If you decide to reject the null hypothesis, you can support the alternative 
hypothesis. 


8. The level of significance is the maximum probability you allow for rejecting 
a null hypothesis when it is actually true. 


9. A large P-value in a test will favor rejection of the null hypothesis. 


10. If you want to support a claim, write it as your null hypothesis. 


Stating Hypotheses Jn Exercises 11-16, use the given statement to represent 
a claim. Write its complement and state which is Hy and which is H,. 


11. pw = 645 12. w < 128 
13. 0 #5 14,67 = 12 
15. p < 0.45 16. p = 0.21 


Graphical Analysis Jn Exercises 17-20, match the alternative hypothesis with 
its graph. Then state the null hypothesis and sketch its graph. 


17. Hy: pw > 3 (a) <—+———_o——_+———+— _ 
1 2 3 4 

18. H,: uw <3 O)——_————__—_———— 
1 2 3 4 

19. H,: p # 3 (c) «+ —<—$— I! 
1 2 3 4 

20. Hy: w > 2 (d) <—————2 —_+—- 
1 2 3 4 


Identifying Tests Jn Exercises 21-24, determine whether the hypothesis 
test with the given null and alternative hypotheses is left-tailed, right-tailed, or 
two-tailed. 


21. Ho: uw = 8.0 2. Agee 2352 
Ay: > 8.0 Aya < 5.2 
23. Ho: 07 = 142 24. Hy: p = 0.25 
H,: 07 # 142 H,: p # 0.25 
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M@ USING AND INTERPRETING CONCEPTS 


Stating the Hypotheses Jn Exercises 25-30, write the claim as a mathematical 
sentence. State the null and alternative hypotheses, and identify which represents 
the claim. 


25. Light Bulbs A light bulb manufacturer claims that the mean life of a 
certain type of light bulb is more than 750 hours. 


26. Shipping Errors As stated by a company’s shipping department, the 
number of shipping errors per million shipments has a standard deviation 
that is less than 3. 


27. Base Price of an ATV The standard deviation of the base price of a certain 
type of all-terrain vehicle is no more than $320. 


28. Oak Trees A state park claims that the mean height of the oak trees in the 
park is at least 85 feet. 


29. Drying Time A company claims that its brands of paint have a mean 
drying time of less than 45 minutes. 


30. MP3 Players According to a recent survey, 74% of college students own 
an MP3 player. (Source: Harris Interactive) 


Identifying Errors Jn Exercises 31-36, write sentences describing type I and 
type II errors for a hypothesis test of the indicated claim. 


31. Repeat Buyers A furniture store claims that at least 60% of its new 
customers will return to buy their next piece of furniture. 


32. Flow Rate A garden hose manufacturer advertises that the mean flow rate 
of a certain type of hose is 16 gallons per minute. 


33. Chess A local chess club claims that the length of time to play a game has 
a standard deviation of more than 12 minutes. 


34. Video Game Systems A researcher claims that the proportion of adults in 
the United States who own a video game system is not 26%. 


35. Police A police station publicizes that at most 20% of applicants become 
police officers. 


36. Computers A computer repairer advertises that the mean cost of removing 
a virus infection is less than $100. 


Identifying Tests Jn Exercises 37-42, state Hy and H, in words and in 
symbols. Then determine whether the hypothesis test is left-tailed, right-tailed, or 
two-tailed. Explain your reasoning. 


37. Security Alarms At least 14% of all homeowners have a home security 
alarm. 


38. Clocks A manufacturer of grandfather clocks claims that the mean time its 
clocks lose is no more than 0.02 second per day. 


39. Golf The standard deviation of the 18-hole scores for a golfer is less than 
2.1 strokes. 


40. Lung Cancer A government report claims that the proportion of lung 
cancer cases that are due to smoking is 87%. (Source: LungCancer.org) 
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Baseball A baseball team claims that the mean length of its games is less 
than 2.5 hours. 


Tuition A state claims that the mean tuition of its universities is no more 
than $25,000 per year. 


Interpreting a Decision In Exercises 43-48, consider each claim. If a 
hypothesis test is performed, how should you interpret a decision that 


(a) rejects the null hypothesis? 


(b) fails to reject the null hypothesis? 


43. 


44, 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52. 


Swans _ A scientist claims that the mean incubation period for swan eggs is 
less than 40 days. 


Lawn Mowers The standard deviation of the life of a certain type of lawn 
mower is at most 2.8 years. 


Hourly Wages The U.S. Department of Labor claims that the proportion of 
full-time workers earning over $450 per week is greater than 75%. (Adapted 
from U.S. Bureau of Labor Statistics) 


Gas Mileage An automotive manufacturer claims the standard deviation 
for the gas mileage of its models is 3.9 miles per gallon. 


Health Care Visits A researcher claims that the proportion of people who 
have had no health care visits in the past year is less than 17%. (Adapted from 
National Center for Health Statistics) 


Calories A sports drink maker claims the mean calorie content of its 
beverages is 72 calories per serving. 


Writing Hypotheses: Medicine Your medical research team is investigating 
the mean cost of a 30-day supply of a certain heart medication. A 
pharmaceutical company thinks that the mean cost is less than $60. You 
want to support this claim. How would you write the null and alternative 
hypotheses? 


Writing Hypotheses: Taxicab Company A taxicab company claims that 
the mean travel time between two destinations is about 21 minutes. You work 
for the bus company and want to reject this claim. How would you write the 
null and alternative hypotheses? 


Writing Hypotheses: Refrigerator Manufacturer A refrigerator manufacturer 
claims that the mean life of its competitor’s refrigerators is less than 
15 years. You are asked to perform a hypothesis test to test this claim. How 
would you write the null and alternative hypotheses if 


(a) you represent the manufacturer and want to support the claim? 

(b) you represent the competitor and want to reject the claim? 

Writing Hypotheses: Internet Provider An Internet provider is trying to 
gain advertising deals and claims that the mean time a customer spends 
online per day is greater than 28 minutes. You are asked to test this claim. 
How would you write the null and alternative hypotheses if 

(a) you represent the Internet provider and want to support the claim? 


(b) you represent a competing advertiser and want to reject the claim? 
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M@ EXTENDING CONCEPTS 


53. Getting at the Concept Why can decreasing the probability of a type I error 
cause an increase in the probability of a type II error? 


54. Getting at the Concept Explain why a level of significance of a = 0 is not 
used. 


55. Writing A null hypothesis is rejected with a level of significance of 0.05. 
Is it also rejected at a level of significance of 0.10? Explain. 


56. Writing A null hypothesis is rejected with a level of significance of 0.10. 
Is it also rejected at a level of significance of 0.05? Explain. 


Graphical Analysis In Exercises 57-60, you are given a null hypothesis and 
three confidence intervals that represent three samplings. Decide whether each 
confidence interval indicates that you should reject Hj. Explain your reasoning. 


57. Hy: 270 (a) 67 <uU<71 
a ag ft 
67 68 6 70 71 72 73 67 68 69 70 71 72 73 
(b) 67<p<69 
ee an 
67 68 6 70 71 722 73 
(c) 69.5 << 72.5 
<p oo ft 


67 «68 «#69 «© 70) «710 672 —=«(73 


58. Hy: US 54 (a) 53.5 <u< 56.5 
: - +—> <-> 
S51 52 53 54 55 56 57 51 52 53 54 S55 56 57 


(b) 51.5 <u< 54.5 


51 52 53 54 55 56 57 
(c) 54.5 << 55.5 
< }-—}—}-ompmno}- > 

51 52 53 54 55 56 57 


59. Hy: p< 0.20 (a) 0.21 <p<0.23 
oeosarrt—~ P —— +} on i 
0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.17 0.18 0.19 0.20 0.21 0.22 0.23 
(b) 0.19 <p <0.23 
>} $< p 
0.17 0.18 0.19 0.20 0.21 0.22 0.23 
(c) 0.175 <p <0.205 


opto ff tp 
0.17 0.18 0.19 0.20 0.21 0.22 0.23 


60. Hy: p= 0.73 (a) 0.73 <p <0.75 
. — Pp <—}—_}—_}—_ 0 —|> p 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.70 0.71 0.72 0.73 0.74 0.75 0.76 


(b) = 0.715 <p < 0.725 
—j—_peo-+_ + _+—_ + > p 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 
(c) 0.695 < p < 0.745 
— ++ 1 6 
0.70 0.71 0.72 0.73 0.74 0.75 0.76 
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iim Hypothesis Testing for the Mean (Large Samples) 


WHAT YOU SHOULD LEARN Using P-Values to Make Decisions » Using P-Values for a z-Test > Rejection 
Regions and Critical Values > Using Rejection Regions for a z-Test 
> How to find P-val d 
SPE a Sie fa aaa > USING P-VALUES TO MAKE DECISIONS 


them to test a mean w 
In Chapter 5, you learned that when the sample size is at least 30, the sampling 


2 Tedeta Me) eet GH ue ie te distribution for ¥ (the sample mean) is normal. In Section 7.1, you learned that a 
z-test oe: ; : 

way to reach a conclusion in a hypothesis test is to use a P-value for the sample 

> How to find critical values and statistic, such as X. Recall that when you assume the null hypothesis is true, a 

rejection regions in a normal P-value (or probability value) of a hypothesis test is the probability of obtaining 

distribution a sample statistic with a value as extreme or more extreme than the one 


determined from the sample data. The decision rule for a hypothesis test based 


> P F F : 
How to use rejection regions on a P-value is as follows. 


for a z-test 
DECISION RULE BASED ON P-VALUE 
To use a P-value to make a conclusion in a hypothesis test, compare the 
P-value with a. 
1. If P = a, then reject Hp. 
2. If P > a, then fail to reject Ho. 
EXAMPLE 1 
> Interpreting a P-Value 
The P-value for a hypothesis test is P = 0.0237. What is your decision if the 
level of significance is (1) a = 0.05 and (2) a = 0.01? 
> Solution 
1. Because 0.0237 < 0.05, you should reject the null hypothesis. 
2. Because 0.0237 > 0.01, you should fail to reject the null hypothesis. 
> Try It Yourself 1 
INSIGHT J ae | a 
The P-value for a hypothesis test is P = 0.0347. What is your decision if the 
Ud oeitiey Matte Melty ane level of significance is (1) a = 0.01 and (2) a = 0.05? 
evidence there is in favor of 
rejecting Ho. The P-value gives a. Compare the P-value with the level of significance. 
you the lowest level of b. Make a decision. Answer: Page A41 


significance for which 
the sample statistic 
allows you to reject 


AWeTUIRUpOTHE In FINDING THE P-VALUE FOR A HYPOTHESIS TEST 


Example 1, youwould After determining the hypothesis test’s standardized test statistic and the 
reject Hp at any level of test statistic’s corresponding area, do one of the following to find the 
significance greater than P-value. 


or equal to 0.0237. 
a. For a left-tailed test, P = (Area in left tail). 


b. For a right-tailed test, P = (Area in right tail). 
c. For a two-tailed test, P = 2(Area in tail of test statistic). 
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The area to the left 
of z= —2.23 is 
P=0.0129. 


| 
Ww 
i 
N 
| 
a 
ot 
= 
i) 
w 


Left-Tailed Test 


The area to the right 
of z= 2.14 is 0.0162, so 
P = 2(0.0162) = 0.0324. 


Two-Tailed Test 


EXAMPLE 2 


> Finding a P-Value for a Left-Tailed Test 


Find the P-value for a left-tailed hypothesis test with a test statistic of 
Zz = —2.23. Decide whether to reject Hp if the level of significance is a = 0.01. 


> Solution 


The graph shows a standard normal curve with a shaded area to the left of 
z = —2.23. For a left-tailed test, 


P = (Area in left tail). 


From Table 4 in Appendix B, the area corresponding to z = —2.23 is 0.0129, 
which is the area in the left tail. So, the P-value for a left-tailed hypothesis test 
with a test statistic of z = —2.23 is P = 0.0129. 


Interpretation Because the P-value of 0.0129 is greater than 0.01, you should 
fail to reject Ho. 


> Try It Yourself 2 


Find the P-value for a left-tailed hypothesis test with a test statistic of 
z= 1.71. Decide whether to reject Hy if the level of significance is 
a = 0.05. 


a. Use Table 4 in Appendix B to find the area that corresponds to z = —1.71. 
b. Calculate the P-value for a left-tailed test, the area in the left tail. 
c. Compare the P-value with a and decide whether to reject Hp. 

Answer: Page A41 


EXAMPLE 3 


> Finding a P-Value for a Two-Tailed Test 


Find the P-value for a two-tailed hypothesis test with a test statistic of 
z = 2.14. Decide whether to reject Hp if the level of significance is a = 0.05. 


> Solution 
The graph shows a standard normal curve with shaded areas to the left of 
z = —2.14 and to the right of z = 2.14. For a two-tailed test, 

P = 2(Area in tail of test statistic). 


From Table 4, the area corresponding to z = 2.14 is 0.9838. The area in the 
right tail is 1 — 0.9838 = 0.0162. So, the P-value for a two-tailed hypothesis 
test with a test statistic of z = 2.14 is 


P = 2(0.0162) = 0.0324. 


Interpretation Because the P-value of 0.0324 is less than 0.05, you should 
reject Ho. 


> Try It Yourself 3 


Find the P-value for a two-tailed hypothesis test with a test statistic of 
z = 1.64. Decide whether to reject Hp if the level of significance is a = 0.10. 


a. Use Table 4 to find the area that corresponds to z = 1.64. 
b. Calculate the P-value for a two-tailed test, twice the area in the tail of the 
test statistic. 
c. Compare the P-value with a and decide whether to reject Hp. 
Answer: Page A41 
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>» USING P-VALUES FOR A z-TEST 


The z-test for the mean is used in populations for which the sampling distribution 
of sample means is normal. To use the z-test, you need to find the standardized 
value for your test statistic x. 


(Sample mean) — (Hypothesized mean) 


4 =— 
Standard error 


With all hypothesis tests, The z-test for a mean is a Statistical test for a population mean. The z-test can 
it is helpful to sketch the be used when the population is normal and o is known, or for any 
sampling distribution. Your ¢ population when the sample size n is at least 30. The test statistic is the 
sketch should include ) sample mean x and the standardized test statistic is 
the standardized test She =_ 
statistic. z= ue 

ay o/Vn 


Recall that = = standard error = Ox. 


Vn 


When n = 30, you can use the sample standard deviation s in place of a. 


INSIGHT 


When the sample size is at least Using P-Values for a z-Test for Mean 
30, you know the following 


about the sampling distribution 1) ASMA oe 
of sample means. 1. State the claim mathematically State Hp and H,. 
(1) The shape is normal. and verbally. Identify the null 
(2) The mean is the , and alternative hypotheses. 
hyparnesize ginal 2. Specify the level of significance. Identify a. 
(3) The standard error ° 
is s/\/n, where s is ; . = jo ; 
used in place of o. & Determine the standardized test g= Ae or, if n = 30, 
statistic. oi 
use 0 © 5. 
4. Find the area that corresponds Use Table 4 in Appendix B. 
to z. 


5. Find the P-value. 
a. For a left-tailed test, P = (Area in left tail). 
b. For a right-tailed test, P = (Area in right tail). 
c. For a two-tailed test, P = 2(Area in tail of test statistic). 


6. Make a decision to reject or fail Reject Hp if P-value 
to reject the null hypothesis. is less than or equal 
to a. Otherwise, fail to 
reject A. 


7. Interpret the decision in the context 
of the original claim. 
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EXAMPLE 4 


>» Hypothesis Testing Using P-Values 


In auto racing, a pit stop is where a racing vehicle stops for new tires, fuel, 
repairs, and other mechanical adjustments. The efficiency of a pit crew that 
makes these adjustments can affect the outcome of a race. A pit crew claims 
that its mean pit stop time (for 4 new tires and fuel) is less than 13 seconds. A 
random selection of 32 pit stop times has a sample mean of 12.9 seconds and a 
standard deviation of 0.19 second. Is there enough evidence to support the 
claim at a = 0.01? Use a P-value. 


> Solution 


The claim is “the mean pit stop time is less than 13 seconds.” So, the null and 
alternative hypotheses are 


Hy: pw = 13 seconds and H,: w < 13 seconds. (Claim) 


The level of significance is a = 0.01. The standardized test statistic is 


Z= avi Because n = 30, use the z-test. 
x plates Because n = 30, use 0 © s = 0.19. Assume p = 13. 
0.19/32 
~ —2.98. 


In Table 4 in Appendix B, the area corresponding to z = —2.98 is 0.0014. 
Because this test is a left-tailed test, the P-value is equal to the area to the left 
of z = —2.98. So, P = 0.0014. Because the P-value is less than a = 0.01, you 
should decide to reject the null hypothesis. 


The area to the left 
of z = —2.98 is 
P=0.0014. 


-3 2 -1 0 1 2 3 
Z=-2.98 


Left-Tailed Test 


Interpretation ‘There is enough evidence at the 1% level of significance to 
support the claim that the mean pit stop time is less than 13 seconds. 


> Try It Yourself 4 


Homeowners claim that the mean speed of automobiles traveling on their 
street is greater than the speed limit of 35 miles per hour. A random sample of 
100 automobiles has a mean speed of 36 miles per hour and a standard 
deviation of 4 miles per hour. Is there enough evidence to support the claim at 
a = 0.05? Use a P-value. 


. Identify the claim. Then state the null and alternative hypotheses. 
. Identify the level of significance. 

. Find the standardized test statistic z. 

. Find the P-value. 

. Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 


memoan & & 


Answer: Page A41 
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See MINITAB 
EXAMPLE 5 G@® Report 29 steps onparedal: 


>» Hypothesis Testing Using P-Values 


The National Institute of Diabetes and Digestive and Kidney Diseases reports 
that the average cost of bariatric (weight loss) surgery is about $22,500. You 
think this information is incorrect. You randomly select 30 bariatric surgery 
patients and find that the average cost for their surgeries is $21,545 with a 
standard deviation of $3015. Is there enough evidence to support your claim 
at a = 0.05? Use a P-value. (Adapted from National Institute of Diabetes and 
Digestive and Kidney Diseases) 


> Solution 


The claim is “the mean is different from $22,500.” So, the null and alternative 
hypotheses are 


Ho: w= $22,500 
and 
Hy: w # $22,500. (Claim) 


The level of significance is a = 0.05. The standardized test statistic is 


xX— bm 
ZL Because n = 30, use the z-test. 
a/Vn 
_ 21,545 — 22,500 Because n = 30, use o © s = 3015. 
. 3015/30 Assume mw = 22,500. 
The area to the left of ~ —1.73. 


z=-1.73 is 0.0418, so 
P = 2(0.0418) = 0.0836. 


In Table 4, the area corresponding to z = —1.73 is 0.0418. Because the test is 


a two-tailed test, the P-value is equal to twice the area to the left of z = —1.73. 
So, 
P = 2(0.0418) 
= 0.0836. 


Because the P-value is greater than a, you should fail to reject the null 
Two-Tailed Test hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the claim that the mean cost of bariatric surgery is different from 
$22,500. 


> Try It Yourself 5 


One of your distributors reports an average of 150 sales per day. You suspect 
that this average is not accurate, so you randomly select 35 days and determine 
the number of sales each day. The sample mean is 143 daily sales with a 
standard deviation of 15 sales. At a = 0.01, is there enough evidence to doubt 
the distributor’s reported average? Use a P-value. 


. Identify the claim. Then state the null and alternative hypotheses. 
. Identify the level of significance. 
Find the standardized test statistic z. 
. Find the P-value. 
. Decide whether to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
Answer: Page A41 


moan op 
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STUDY TIP 


Using a TI-83/84 Plus, you can 
either enter the original data 
into a list to find a P-value or 
enter the descriptive statistics. 


STAT 


Choose the TESTS menu. 
1: Z-Test... 


Select the Data input option if 
you use the original data. Select 
the Stats input option if 
you use the descriptive 
statistics. In each case, 
enter the appropriate 
values including the 
corresponding type 

of hypothesis test 
indicated by the 
alternative hypothesis. 
Then select Calculate. 


INSIGHT 


If the test statistic falls 
in a rejection region, 
it would be considered 
an unusual event. 


EXAMPLE 6 


» Using a Technology Tool to Find a P-Value 


What decision should you make for the following TI-83/84 Plus displays, using 
a level of significance of a = 0.05? 


TI-83/84 Plus TI-83/84 Plus 


meal HO Fo 
culate Ora 


> Solution 


The P-value for this test is given as 0.0440464253. Because the P-value is less 
than 0.05, you should reject the null hypothesis. 


> Try It Yourself 6 


For the TI-83/84 Plus hypothesis test shown in Example 6, make a decision at 
the a = 0.01 level of significance. 


a. Compare the P-value with the level of significance. 
b. Make your decision. Answer: Page A41 


> REJECTION REGIONS AND CRITICAL VALUES 


Another method to decide whether to reject the null hypothesis is to determine 
whether the standardized test statistic falls within a range of values called the 
rejection region of the sampling distribution. 


DEFINITION 


A rejection region (or critical region) of the sampling distribution is the range 
of values for which the null hypothesis is not probable. If a test statistic falls in 
this region, the null hypothesis is rejected. A critical value zy) separates the 
rejection region from the nonrejection region. 


GUIDELINES 


Finding Critical Values in a Normal Distribution 


1. Specify the level of significance a. 

2. Decide whether the test is left-tailed, right-tailed, or two-tailed. 

3. Find the critical value(s) zo. If the hypothesis test is 
a. left-tailed, find the z-score that corresponds to an area of a. 
b. right-tailed, find the z-score that corresponds to an area of 1 — a. 
c. two-tailed, find the z-scores that correspond to 5a FVOK6 Os il Sa. 


4. Sketch the standard normal distribution. Draw a vertical line at each 
critical value and shade the rejection region(s). 
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If you cannot find the exact area in Table 4, use the area that is closest. 
When the area is exactly midway between two areas in the table, use the z-score 
midway between the corresponding z-scores. 


EXAMPLE 7 


> Finding a Critical Value for a Left-Tailed Test 
Find the critical value and rejection region for a left-tailed test with a = 0.01. 


> Solution 


The graph shows a standard normal curve 
with a shaded area of 0.01 in the left tail. 
In Table 4, the z-score that is closest to an 


area of 0.01 is —2.33. So, the critical value =e 
is Z) = —2.33. The rejection region is to a 
the left of this critical value. “3 / 2 -l 0 1 2 3 

2% = —2.33 


1% Level of Significance 
> Try It Yourself 7 


Find the critical value and rejection region for a left-tailed test with a = 0.10. 


a. Draw a graph of the standard normal curve with an area of a in the left tail. 
b. Use Table 4 to find the area that is closest to a. 
c. Find the z-score that corresponds to this area. 
d. Identify the rejection region. 


EXAMPLE 8 


» Finding a Critical Value for a Two-Tailed Test 
Find the critical values and rejection regions for a two-tailed test with a = 0.05. 


Answer: Page A41 


STUDY TIP 


Notice in Example 8 that the critical 
values are opposites. This is always 
true for two-tailed z-tests. 


The table lists the critical 
values for commonly used 
levels of significance. 


0.10 Left —1.28 


Right 1.28 
Two +1.645 
0.05 Left — 1.645 
Right 1.645 
Two +1.96 
0.01 Left —2.33 
Right 2.33 
Two +2.575 


> Solution 


The graph shows a standard normal curve 
with shaded areas of Sa = 0.025 in each 
tail. The area to the left of —z is 5a =0.025 
5a = 0.025, and the area to the left of zp 
is 1 — 5a = 0.975. In Table 4, the z-scores 
that correspond to the areas 0.025 and 
0.975 are —1.96 and 1.96, respectively. So, 
the critical values are —z) = —1.96 and 
Zq = 1.96. The rejection regions are to 
the left of —1.96 and to the right of 1.96. 


> Try It Yourself 8 


Find the critical values and rejection regions for a two-tailed test with 
a = 0.08. 


| 
3. -2\ -1 0 1 2 3 
-Zj=-196 z= 1.96 


5% Level of Significance 


a. Draw a graph of the standard normal curve with an area of Sa in each tail. 


b. Use Table 4 to find the areas that are closest to 5a and 1— 4a. 
c. Find the z-scores that correspond to these areas. 


d. Identify the rejection regions. Answer: Page A41 


Presented by: https://jafrilibrary.org 


378 


CHAPTER 7 


Presented by: https://jafrilibrary.org 


HYPOTHESIS TESTING WITH ONE SAMPLE 


>» USING REJECTION REGIONS FOR A z-TEST 


To conclude a hypothesis test using rejection region(s), you make a decision and 
interpret the decision as follows. 


DECISION RULE BASED ON REJECTION REGION 


To use a rejection region to conduct a hypothesis test, calculate the standardized 
test statistic z. If the standardized test statistic 


1. is in the rejection region, then reject Ho. 
2. is not in the rejection region, then fail to reject Hp. 


Fail to reject Hp. Fail to reject Hp. 


Reject Hp. Reject Hp. 

= 7 & t . ig 

ES rai) 2% 0 0) 2 Le Zo 
Left-Tailed Test Right-Tailed Test 


Fail to reject Hp. 


Reject Hp. Reject Ho. 


-_ t mEEEe 
ZS —%p-% 0 % 27% 


Two-Tailed Test 


Failing to reject the null hypothesis does not mean that you have accepted 
the null hypothesis as true. It simply means that there is not enough evidence to 
reject the null hypothesis. 


GUIDELINES 


Using Rejection Regions for a z-Test for a Mean pr 
IN WORDS IN SYMBOLS 
1. State the claim mathematically State Hp and H,. 


and verbally. Identify the null 
and alternative hypotheses. 


2. Specify the level of significance. Identify a. 
3. Determine the critical value(s). Use Table 4 in Appendix B. 
4. Determine the rejection region(s). 
x — 
5. Find the standardized test statistic Z= as Oe, it 7 = 30, 
and sketch the sampling distribution. o/Vn 
use 0 © 8. 
6. Make a decision to reject or fail to If z is in the rejection region, 
reject the null hypothesis. reject Hy. Otherwise, fail to 
reject Ho. 


7. Interpret the decision in the context 
of the original claim. 
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Each year, the Environmental 


Protection Agency (EPA) publishes 


reports of gas mileage for all 
makes and models of passenger 
vehicles. In a recent year, small 
station wagons with automatic 
transmissions that posted the 
best mileage were the Audi A3 


(diesel) and the Volkswagen Jetta 


SportWagen (diesel). Each had 
a mean mileage of 30 miles per 
gallon (city) and 42 miles per 
gallon (highway). Suppose that 
Volkswagen believes a Jetta 
SportWagen exceeds 42 miles 
per gallon on the highway. 

To support its claim, it tests 

36 vehicles on highway driving 
and obtains a sample mean of 
43.2 miles per gallon with a 
standard deviation of 2.1 miles 
per gallon. (Source: U.S. Department of 
Energy) 


Ts the evidence strong enough 
to support the claim that the 
Jetta SportWagen’s highway 
miles per gallon exceeds the 
EPA estimate? Use a z-test 
with a = 0.01. 


SECTION 7.2 


EXAMPLE 9 
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See TI-83/84 Plus 
steps on page 425. 


> Testing with a Large Sample 


Employees at a construction and mining company claim that the mean 
salary of the company’s mechanical engineers is less than that of one of its 
competitors, which is $68,000. A random sample of 30 of the company’s 
mechanical engineers has a mean salary of $66,900 with a standard deviation 
of $5500. At a = 0.05, test the employees’ claim. 


> Solution 


The claim is “the mean salary is less than $68,000.” So, the null and alternative 
hypotheses can be written as 


Ho: wp = $68,000 and H,: & < $68,000. (Claim) 


Because the test is a left-tailed test and the level of significance is a = 0.05, 
the critical value is z) = —1.645 and the rejection region is z < —1.645. The 
standardized test statistic is 


~*~ # 
~ o/Vn 
_ 66,900 — 68,000 


5500/30 


~ —1.10. 


Z Because n = 30, use the z-test. 


Because n = 30, use o © s = 5500. 
Assume p = 68,000. 


1-—a@=0.95 


The graph shows the location of the 
rejection region and the standardized 
test statistic z. Because z is not in the 
rejection region, you fail to reject the null 
hypothesis. 


Interpretation There is not enough yee 10 
evidence at the 5% level of significance to A As 
support the employees’ claim that the 


mean salary is less than $68,000. Petal AIC AGE 


Be sure you understand the decision made in this example. Even though 
your sample has a mean of $66,900, you cannot (at a 5% level of significance) 
support the claim that the mean of all the mechanical engineers’ salaries is less 
than $68,000. The difference between your test statistic and the hypothesized 
mean is probably due to sampling error. 


> Try It Yourself 9 


The CEO of the company claims that the mean work day of the company’s 
mechanical engineers is less than 8.5 hours. A random sample of 35 of the 
company’s mechanical engineers has a mean work day of 8.2 hours with a 
standard deviation of 0.5 hour. At a = 0.01, test the CEO’s claim. 


. Identify the claim and state Hy) and H,. 
. Identify the level of significance a. 
Find the critical value z) and identify the rejection region. 
. Find the standardized test statistic z. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
Answer: Page A41 


monn op 
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EXAMPLE 10 


> Testing ju with a Large Sample 

The U.S. Department of Agriculture claims that the mean cost of raising a child 
from birth to age 2 by husband-wife families in the United States is $13,120. 
A random sample of 500 children (age 2) has a mean cost of $12,925 with a 
standard deviation of $1745. At a = 0.10, is there enough evidence to reject 
the claim? (Adapted from U.S. Department of Agriculture Center for Nutrition Policy 
and Promotion) 


> Solution 


The claim is “the mean cost is $13,120.” So, the null and alternative hypotheses 
are 


Ho: w = $13,120 (Claim) 


Lh 
= 
F 
x 
a 


and 


Using a TI-83/84 Plus, you can Ag: w # $13,120. 

find the standardized test Because the test is a two-tailed test and the level of significance is a = 0.10, 

statistic automatically. the critical values are —z) = —1.645 and zp) = 1.645. The rejection regions are 
Zz < —1.645 and z > 1.645. The standardized test statistic is 


A= avi Because n = 30, use the z-test. 
ts 12,925 — 13,120 Because n = 30, use a © s = 1745. 
1745//500 Assume p = 13,120. 
= —2.50. 
The graph shows the location of the 1-a@=0.90 


rejection regions and the standardized 
test statistic z. Because z is in the 5% = 0.05 
rejection region, you should reject the 
null hypothesis. 


Interpretation There is enough evidence -3/ -2 \1 0 1/2 3 
at the 10% level of significance to reject z~=—2.50 -z,=-1.645 z= 1.645 
the claim that the mean cost of raising a 
child from birth to age 2 by husband-wife 
families in the United States is $13,120. 


> Try It Yourself 10 


Using the information and results of Example 10, determine whether there is 
enough evidence to reject the claim that the mean cost of raising a child from 
birth to age 2 by husband-wife families in the United States is $13,120. Use 
a = 0.01. 


5% Level of Significance 


a. Identify the level of significance a. 
b. Find the critical values —zq and zo and identify the rejection regions. 
c. Sketch a graph. Decide whether to reject the null hypothesis. 
d. Interpret the decision in the context of the original claim. 
Answer: Page A41 
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EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain the difference between the z-test for wu using rejection region(s) and 
the z-test for w using a P-value. 


FOR EXTRA HELP: 2. In hypothesis testing, does choosing between the critical value method or the 
B Mh P-value method affect your conclusion? Explain. 


In Exercises 3-8, find the P-value for the indicated hypothesis test with the given 
standardized test statistic z. Decide whether to reject Hy for the given level of 
significance a. 


3. Left-tailed test, z = —1.32, 4. Left-tailed test, z = —1.55, 
a = 0.10 a = 0.05 

5. Right-tailed test, z = 2.46, 6. Right-tailed test, z = 1.23, 
a = 0.01 a = 0.10 

7. Two-tailed test, z = —1.68, 8. Two-tailed test, z = 2.30, 
a = 0.05 a = 0.01 


Graphical Analysis In Exercises 9-12, match each P-value with the graph that 
displays its area. The graphs are labeled (a)-(d). 


9. P = 0.0089 10. P = 0.3050 
11. P = 0.0688 12. P = 0.0287 
(a) (b) 
ee a ee 3/2-1 0123, 
z= 1.90 Z=-2.37 
(c) (d) 
3-2-1 0 1/2 3 32-1 \0 12 3, 
z= 1.82 z=-0.51 


13. Given Ho: w = 100, H,: w # 100, and P = 0.0461. 


(a) Do you reject or fail to reject Hp at the 0.01 level of significance? 
(b) Do you reject or fail to reject Hp at the 0.05 level of significance? 


14. Given Ho: w = 8.5, Hy: w < 8.5, and P = 0.0691. 


(a) Do you reject or fail to reject Hp at the 0.01 level of significance? 
(b) Do you reject or fail to reject Hp at the 0.05 level of significance? 
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In Exercises 15 and 16, use the TI-83/84 Plus displays to make a decision to reject 
or fail to reject the null hypothesis at the given level of significance. 


15. a = 0.05 ?—Test 
Thetifats BEL 
Hore 


16. a = 0.01 7—Test 
Tnetifats BE 
F442 


Calculate Uraw 


Finding Critical Values 9 Jn Exercises 17-22, find the critical value(s) for the 
indicated type of test and level of significance a. Include a graph with your 


answer. 
17. Right-tailed test,a = 0.05 18. Right-tailed test, a = 0.08 
19. Left-tailed test,a = 0.03 20. Left-tailed test, a = 0.09 
21. Two-tailed test,a = 0.02 22. Two-tailed test, a = 0.10 


Graphical Analysis Jn Exercises 23 and 24, state whether each standardized 
test statistic z allows you to reject the null hypothesis. Explain your reasoning. 


23. (a) z = —1.301 24. (a) z = 1.98 
(b) z = 1.203 (b) z = 1.89 
(c) z = 1.280 (c) z = 1.65 
(d) z = 1.286 (d) z = —1.99 
32-10 1\2 3. 3 -A\-1 001A 3. 
zy = 1.285 -z9=-1.96 z= 1.96 


In Exercises 25-28, test the claim about the population mean wp at the given level 
of significance a using the given sample statistics. 


25. Claim: w = 40; a = 0.05. 
Sample statistics: ¥ = 39.2, s = 3.23, n = 75 


26. Claim: w > 1745; a = 0.10. 
Sample statistics: ¥ = 1752, s = 38,n = 44 


27. Claim: wp # 8550; a = 0.02. 
Sample statistics: ¥ = 8420, s = 314, n = 38 


28. Claim: w = 22,500; a = 0.01. 
Sample statistics: ¥ = 23,250, s = 1200, n = 45 
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M@ USING AND INTERPRETING CONCEPTS 


Testing Claims Using P-Values In Exercises 29-34, 
(a) write the claim mathematically and identify Hy and H,. 


(b) find the standardized test statistic z and its corresponding area. If convenient, 


use technology. 


(c) find the P-value. If convenient, use technology. 


(d) decide whether to reject or fail to reject the null hypothesis. 


(e) interpret the decision in the context of the original claim. 


29. 


30. 


31. 


32. 


r 
4 


MCAT Scores A random sample of 50 medical school applicants at a 
university has a mean raw score of 31 with a standard deviation of 2.5 on the 
multiple choice portions of the Medical College Admission Test (MCAT). 
A student says that the mean raw score for the school’s applicants is more 
than 30. At a = 0.01, is there enough evidence to support the student’s 
claim? (Adapted from Association of American Medical Colleges) 


Sprinkler Systems A manufacturer of sprinkler systems designed for fire 
protection claims that the average activating temperature is at least 135°F 
To test this claim, you randomly select a sample of 32 systems and find the 
mean activation temperature to be 133°F with a standard deviation of 3.3°F At 
a = 0.10, do you have enough evidence to reject the manufacturer’s claim? 


Bottled Water Consumption The U.S. Department of Agriculture claims 
that the mean consumption of bottled water by a person in the United States 
is 28.5 gallons per year. A random sample of 100 people in the United States 
has a mean bottled water consumption of 27.8 gallons per year with a 
standard deviation of 4.1 gallons. At a = 0.08, can you reject the claim? 
(Adapted from U.S. Department of Agriculture) 


Coffee Consumption The U.S. Department of Agriculture claims that the 
mean consumption of coffee by a person in the United States is 24.2 gallons 
per year. A random sample of 120 people in the United States shows that the 
mean coffee consumption is 23.5 gallons per year with a standard deviation 
of 3.2 gallons. At a = 0.05, can you reject the claim? (Adapted from U.S. 
Department of Agriculture) 


33. Quitting Smoking The lengths of time (in years) it took a random 
sample of 32 former smokers to quit smoking permanently are listed. 
At a = 0.05, is there enough evidence to reject the claim that the mean 
time it takes smokers to quit smoking permanently is 15 years? (Adapted 
from The Gallup Organization) 


15.7 13.2 22.6 13.0 10.7 181 14.7 7.0 173 7.5 21.8 
123 19.8 13.8 160 15.5 13.1 20.7 15.5 98 11.9 16.9 
7.0 19.3 13.2 146 20.9 154 133 11.6 10.9 21.6 


34. Salaries An analyst claims that the mean annual salary for advertising 
account executives in Denver, Colorado is more than the national mean, 
$66,200. The annual salaries (in dollars) for a random sample of 
35 advertising account executives in Denver are listed. At a = 0.09, 
is there enough evidence to support the analyst’s claim? (Adapted from 
Salary.com) 


69,450 65,910 68,780 66,724 64,125 67,561 62,419 
70,375 65,835 62,653 65,090 67,997 65,176 64,936 
66,716 69,832 63,111 64,550 63,512 65,800 66,150 
68,587 68,276 65,902 63,415 64,519 70,275 70,102 
67,230 65,488 66,225 69,879 69,200 65,179 69,755 
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Weight Loss (in pounds) 
after One Month 


5. | 77 Key: 5|7 = 5.7 
6 | 67 
7/019 

8 | 2279 
9/03568 
10 | 2566 
11 | 12578 
12|078 
13 | 8 

14 

15 | 0 


FIGURE FOR EXERCISE 41 


Testing Claims Using Critical Values In Exercises 35-42, (a) write the 
claim mathematically and identify Hy and H,, (b) find the critical values and 
identify the rejection regions, (c) find the standardized test statistic, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. 


35. 


36. 


37. 


38. 


r 
4 


Caffeine Content in Colas A company that makes cola drinks states that 
the mean caffeine content per 12-ounce bottle of cola is 40 milligrams. You 
want to test this claim. During your tests, you find that a random sample of 
thirty 12-ounce bottles of cola has a mean caffeine content of 39.2 
milligrams with a standard deviation of 7.5 milligrams. At a = 0.01, can you 
reject the company’s claim? (Adapted from American Beverage Association) 


Electricity Consumption The U.S. Energy Information Association claims 
that the mean monthly residential electricity consumption in your town is 
874 kilowatt-hours (kWh). You want to test this claim. You find that a 
random sample of 64 residential customers has a mean monthly electricity 
consumption of 905 kWh and a standard deviation of 125 kWh. At a = 0.05, 
do you have enough evidence to reject the association’s claim? (Adapted from 
U.S. Energy Information Association) 


Light Bulbs A light bulb manufacturer guarantees that the mean life of a 
certain type of light bulb is at least 750 hours. A random sample of 36 light 
bulbs has a mean life of 745 hours with a standard deviation of 60 hours. At 
a = 0.02, do you have enough evidence to reject the manufacturer’s claim? 


Fast Food A fast food restaurant estimates that the mean sodium content 
in one of its breakfast sandwiches is no more than 920 milligrams. A random 
sample of 44 breakfast sandwiches has a mean sodium content of 925 with 
a standard deviation of 18 milligrams. At a = 0.10, do you have enough 
evidence to reject the restaurant’s claim? 


39. Nitrogen Dioxide Levels A scientist estimates that the mean nitrogen 
dioxide level in Calgary is greater than 32 parts per billion. You want to 
test this estimate. To do so, you determine the nitrogen dioxide levels for 
34 randomly selected days. The results (in parts per billion) are listed 
below. At a = 0.06, can you support the scientist’s estimate? (Adapted 
from Clean Air Strategic Alliance) 


24 36 44 35 44 34 29 40 39 43 41 32 
33 29 29 43 25 39 25 42 29 22 22 25 
144 15 14 29 25 27 22 24 18 17 


40. Fluorescent Lamps A fluorescent lamp manufacturer guarantees that 
the mean life of a certain type of lamp is at least 10,000 hours. You want to 
test this guarantee. To do so, you record the lives of a random sample of 
32 fluorescent lamps. The results (in hours) are shown below. At a = 0.09, 
do you have enough evidence to reject the manufacturer’s claim? 


8,800 9,155 13,001 10,250 10,002 11,413 8,234 10,402 
10,016 8,015 6,110 11,005 11,555 9,254 6,991 12,006 
10,420 8302 8151 10,980 10,186 10,003 8814 11,445 
6,277 8,632 7,265 10,584 9,397 11,987 7,556 10,380 


41. Weight Loss A weight loss program claims that program participants 
have a mean weight loss of at least 10 pounds after 1 month. You work 
for a medical association and are asked to test this claim. A random 
sample of 30 program participants and their weight losses (in pounds) 
after 1 month is listed in the stem-and-leaf plot at the left. At a = 0.03, 
do you have enough evidence to reject the program’s claim? 
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Evacuation Time (in seconds) 


0 
1 
2 
3 
4 
5 
6 
7 
8 
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10 


79 

199 
26799 
1167799 
113334667 
2345788899 
1334667 
469 

46 

4 

2 


Key: 0|7 =7 


FIGURE FOR EXERCISE 42 
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42. Fire Drill An engineering company claims that the mean time it 
takes an employee to evacuate a building during a fire drill is less 
than 60 seconds. You want to test this claim. A random sample of 50 
employees and their evacuation times (in seconds) is listed in the 
stem-and-leaf plot at the left. At a = 0.01, can you support the 
company’s claim? 


In Exercises 43-46, use StatCrunch to help you test the claim about the 
population mean jy at the given level of significance a using the given sample 
statistics. For each claim, assume the population is normally distributed. 


43. 
44, 
45. 


46. 


47. 


48. 


49. 


50. 


Claim: w = 58; a = 0.10. Sample statistics: ¥ = 57.6, s = 2.35, n = 80 
Claim: w > 495; a = 0.05. Sample statistics: ¥ = 498.4, s = 17.8, n = 65 


Claim: w = 1210; a = 0.08. Sample statistics: ¥ = 1234.21, s = 205.87, 
n = 250 


Claim: w # 28,750; a = 0.01. Sample statistics: ¥ = 29,130, s = 3200, 
n = 600 


EXTENDING CONCEPTS 


Water Usage You believe the mean annual water usage of U.S. households 
is less than 127,400 gallons. You find that a random sample of 30 households 
has a mean water usage of 125,270 gallons with a standard deviation of 6275 
gallons. You conduct a statistical experiment where Ho: w = 127,400 and 
Ay: w < 127,400. At a = 0.01, explain why you cannot reject Ho. (Adapted 
from American Water Works Association) 


Vehicle Miles of Travel You believe the annual mean vehicle miles of 
travel (VMT) per U.S. household is greater than 22,000 miles. You do some 
research and find that a random sample of 36 U.S. households has a mean 
annual VMT of 22,200 miles with a standard deviation of 775 miles. You 
conduct a statistical experiment where Hp: w = 22,000 and H,: w > 22,000. 
At a = 0.05, explain why you cannot reject Hp. (Adapted from U.S. Federal 
Highway Administration) 


Using Different Values of a and n In Exercise 47, you believe that Ho is 
not valid. Which of the following allows you to reject Hy? Explain your 
reasoning. 

(a) Use the same values but increase a from 0.01 to 0.02. 

(b) Use the same values but increase a from 0.01 to 0.05. 

(c) Use the same values but increase n from 30 to 40. 

(d) Use the same values but increase n from 30 to 50. 

Using Different Values of a and n In Exercise 48, you believe that Hp is 
not valid. Which of the following allows you to reject Ho? Explain your 
reasoning. 

(a) Use the same values but increase a from 0.05 to 0.06. 

(b) Use the same values but increase a from 0.05 to 0.07. 

(c) Use the same values but increase n from 36 to 40. 


(d) Use the same values but increase n from 36 to 80. 


Presented by: https://jafrilibrary.org 


> 
fa) 
=) 
= 
W 
Lu 
W 
<x 
U 


Human Body Temperature: 


What's Normal? 


In an article in the Journal of Statistics Education 
(vol. 4, no. 2), Allen Shoemaker describes a study that 
was reported in the Journal of the American Medical 
Association (JAMA).* It is generally accepted that 
the mean body temperature of an adult human is 
98.6°F. In his article, Shoemaker uses the data from 
the JAMA article to test this hypothesis. Here is a 
summary of his test. 
Claim: The body temperature of adults is 98.6°F. 

Ho: w = 98.6°F (Claim) A: w # 98.6°F 
Sample Size: n = 130 
Population: Adult human temperatures (Fahrenheit) 
Distribution: Approximately normal 


Test Statistics: x = 98.25, s = 0.73 
* Data for the JAMA article were collected from 
healthy men and women, ages 18 to 40, at the 


University of Maryland Center for Vaccine 
Development, Baltimore. 


M@ EXERCISES 


1. Complete the hypothesis test for all adults 2. 


(men and women) by performing the 
following steps. Use a level of significance of 
a = 0.05. 

(a) Sketch the sampling distribution. 

(b) Determine the critical values and add 


them to your sketch. 4. Test the hypothesis that the mean temperature 


(c) Determine the rejection regions and 
shade them in your sketch. 


(d) Find the standardized test statistic. Add 5 


it to your sketch. 
(e) Make a decision to reject or fail to reject 


the null hypothesis. 6 


(f) Interpret the decision in the context of 
the original claim. 


Men’s Temperatures 
(in degrees Fahrenheit) 


96 | 3 
96 |79 

97/0111234444 
97/556667888899 
98|000000112222334444 
98 |55666666778889 
99|0001234 

99 | 5 

100 
100 Key: 96|3 = 96.3 


Women’s Temperatures 
(in degrees Fahrenheit) 


96 | 4 

96 |78 

97 | 224 
97/677888999 
98 |}00000122222233344444 
98 |5666677777788888889 
99/00112234 
99 |9 

100 | 0 

100 | 8 


Key: 96|4 = 96.4 


If you lower the level of significance to 
a=0.01, does your decision change? 
Explain your reasoning. 


. Test the hypothesis that the mean temperature 


of men is 98.6°F. What can you conclude at 
a level of significance of a = 0.01? 


of women is 98.6°F. What can you conclude 
at a level of significance of a = 0.01? 


. Use the sample of 130 temperatures to form 


a 99% confidence interval for the mean body 
temperature of adult humans. 


. The conventional “normal” body temperature 
was established by Carl Wunderlich over 
100 years ago. What were possible sources of 
error in Wunderlich’s sampling procedure? 
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¥e-%e Hypothesis Testing for the Mean (Small Samples) 


WHAT YOU SHOULD LEARN Critical Values in a t-Distribution > The t-Test for a Mean yw 
(n < 30, a unknown) > Using P-Values with t-Tests 
>» How to find critical val i 
tdisribution =~=~»=>=~)>—SS:~é<“Cs~s~*«é‘ SCRX~CTICALL VALE IN AA t-DIISTRIBUTIIONN 


In Section 7.2, you learned how to perform a hypothesis test for a population 


2 eu MO HEE INS Ease 932 mean when the sample size was at least 30. In real life, it is often not practical to 
patie collect samples of size 30 or more. However, if the population has a normal, or 

> How to use technology to find nearly normal, distribution, you can still test the population mean p. To do so, 
P-values and use them with a you can use the ¢-sampling distribution with n — 1 degrees of freedom. 


t-test to test a mean pw 


GUIDELINES 
Finding Critical Values in a ¢-Distribution 
1. Identify the level of significance a. 


2. Identify the degrees of freedom df. = n — 1. 


3. Find the critical value(s) using Table 5 in Appendix B in the row with 
n — 1 degrees of freedom. If the hypothesis test is 


a. left-tailed, use the “One Tail, aw” column with a negative sign. 
b. right-tailed, use the “One Tail, a” column with a positive sign. 


c. two-tailed, use the “Two Tails, a” column with a negative and a 
positive sign. 


EXAMPLE 1 


> Finding Critical Values for t 
Find the critical value fg for a left-tailed test with a = 0.05 and n = 21. 


> Solution 
The degrees of freedom are 
df.=n—1 
Two-Tailed Test = 21-1 
= 20. 


To find the critical value, use Table 5 in 
Appendix B with d.f. = 20 and a = 0.05 


in the “One Tail, a” column. Because the pal 
test is a left-tailed test, the critical value is 5% Level of Significance 
negative. So, 


> Try It Yourself 1 
Find the critical value fg for a left-tailed test with a = 0.01 and n = 14. 


a. Identify the degrees of freedom. 
b. Use the “One Tail, a” column in Table 5 in Appendix B to find fo. 
Answer: Page A41 
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EXAMPLE 2 


> Finding Critical Values for t 
Find the critical value fp for a right-tailed test with a = 0.01 and n = 17. 


> Solution 
The degrees of freedom are 
df.=n-—1 
=17-1 
= 16. 


To find the critical value, use Table 5 with 
d.f. = 16 and a = 0.01 in the “One Tail, ~4-3-2-! 9 1 2/3 4 
a” column. Because the test is right-tailed, a 
the critical value is positive. So, 1% Level of Significance 


to = 2.583. 


> Try It Yourself 2 
Find the critical value fp for a right-tailed test with a = 0.10 and n = 9. 


a. Identify the degrees of freedom. 
b. Use the “One Tail, a” column in Table 5 in Appendix B to find fo. 
Answer: Page A41 


EXAMPLE 3 


» Finding Critical Values for t 
Find the critical values —f and 4) for a two-tailed test with a = 0.10 and 
n= 26. 
> Solution 
The degrees of freedom are 
df.=n-—1 
= 26-1 
= 25. 


To find the critical values, use Table 5 with an is 
d.f. = 25 and a = 0.10 in the “Two Tails, 4 -3 -2/ af <Q 4 [2 304 
a” column. Because the test is two-tailed, —t=—1.708 t= 1.708 

one critical value is negative and one is 10% Level of Significance 
positive. So, 


—to = —1.708 and ty = 1.708. 


> Try It Yourself 3 


Find the critical values —fp and fg for a two-tailed test with a = 0.05 and 
n= 16. 


a. Identify the degrees of freedom. 
b. Use the “Two Tails, a” column in Table 5 in Appendix B to find fo. 
Answer: Page A41 
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On the basis of a t-test, a decision 
was made whether to send 
truckloads of waste contaminated 
with cadmium to a sanitary 
landfill or a hazardous waste 
landfill. The trucks were sampled 
to determine if the mean level of 
cadmium exceeded the allowable 
amount of 1 milligram per liter 
for a sanitary landfill. Assume 
the null hypothesis was » = 1. 
(Adapted from Pacific Northwest National 
Laboratory) 


Hy True Hy False 


Fail to 
reject Hy. 


Reject Hy. 


Describe the possible 
type I and type II errors 
of this situation. 


SECTION 7.3 
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>» THE t-TEST FOR A MEAN p (n < 30, o UNKNOWN) 


To test a claim about a mean pw using a small sample (n < 30) from a normal, or 
nearly normal, distribution when o is unknown, you can use a f-sampling 
distribution. 


(Sample mean) — (Hypothesized mean) 


Standard error 


t-TEST FOR A MEAN jL 


The ¢-test for a mean is a statistical test for a population mean. The f-test can 
be used when the population is normal or nearly normal, o is unknown, and 
n < 30. The test statistic is the sample mean x and the standardized test 


statistic is 


pees 
~ s/Vn 


The degrees of freedom are 


d.f.=n—1. 


GUIDELINES 


Using the t-Test for a Mean px (Small Sample) 


IN WORDS 


. State the claim mathematically 


and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. 
. Identify the degrees of freedom. 
. Determine the critical value(s). 


. Determine the rejection region(s). 


. Find the standardized test statistic 


and sketch the sampling 
distribution. 


. Make a decision to reject or fail to 


reject the null hypothesis. 


. Interpret the decision in the context 


of the original claim. 


IN SYMBOLS 
State Ho and H,. 


Identify a. 
chi, = 7 = 1 
Use Table 5 in Appendix B. 


ie ee 
— s[Vn 


If ¢ is in the rejection region, 
reject Hy. Otherwise, fail to 
reject Hp. 


Remember that when you make a decision, the possibility of a type I or a 
type IJ error exists. 
If you prefer using P-values, turn to page 392 to learn how to use P-values 
for a t-test for a mean yu (small sample). 
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CHAPTER 7 HYPOTHESIS TESTING WITH ONE SAMPLE 


To explore this topic further, 


~ see Activity 7.3 on page 397. 


See MINITAB 
EXAMPLE 4 steps on page 424. 


> Testing 4 with a Small Sample 


A used car dealer says that the mean price of a 2008 Honda CR-V is at least 
$20,500. You suspect this claim is incorrect and find that a random sample of 
14 similar vehicles has a mean price of $19,850 and a standard deviation of 
$1084. Is there enough evidence to reject the dealer’s claim at a = 0.05? 
Assume the population is normally distributed. (Adapted from Kelley Blue Book) 


> Solution 


The claim is “the mean price is at least $20,500.” So, the null and alternative 
hypotheses are 


Ho: w = $20,500 (Claim) 
and 
H,: w < $20,500. 


The test is a left-tailed test, the level of significance is a = 0.05, and the degrees 
of freedom are d.f. = 14 — 1 = 13. So, the critical value is tg = —1.771. The 
rejection region is t < —1.771. The standardized test statistic is 


xp 

t= Because n < 30, use the f-test. 
s/Vn 

7 19,850 — 20,500 


1084/14 


~ —2.244, 


Assume yp = 20,500. 


The graph shows the location of the 
rejection region and the standardized test 
statistic r. Because fis in the rejection region, 


you should reject the null hypothesis. a= 0,05 


Interpretation There is enough evidence 


at the 5% level of significance to reject the 37 Sc \1 Co ee 
claim that the mean price of a 2008 Honda '~~2:244 = 1.771 
CR-V is at least $20,500. 5% Level of Significance 


> Try It Yourself 4 


An insurance agent says that the mean cost of insuring a 2008 Honda CR-V is 
less than $1200. A random sample of 7 similar insurance quotes has a mean 
cost of $1125 and a standard deviation of $55. Is there enough evidence to 
support the agent’s claim at a = 0.10? Assume the population is normally 
distributed. 


. Identify the claim and state Hy and H,. 

. Identify the level of significance a and the degrees of freedom. 

. Find the critical value ty) and identify the rejection region. 

. Find the standardized test statistic t. Sketch a graph. 

. Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A41 


moan & & 
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See TI-83/84 Plus 
EXAMPLE 5 steps on page 425. 


> Testing 4 with a Small Sample 


An industrial company claims that the mean pH level of the water in a nearby 
river is 6.8. You randomly select 19 water samples and measure the pH of 
each. The sample mean and standard deviation are 6.7 and 0.24, respectively. 
Is there enough evidence to reject the company’s claim at a = 0.05? Assume 
the population is normally distributed. 


> Solution 


The claim is “the mean pH level is 6.8.” So, the null and alternative hypotheses 
are 


Ho: w = 6.8 (Claim) 
and 
Ay wp = 68. 


The test is a two-tailed test, the level of significance is a = 0.05, and the degrees 
of freedom are d.f. = 19 — 1 = 18. So, the critical values are —fp = —2.101 
and f) = 2.101. The rejection regions are f < —2.101 and ¢ > 2.101. The 
standardized test statistic is 


xX— pb 
t= Because n < 30, use the t-test. 
s/Vn 
6.7 — 6.8 
= Assume p = 6.8. 
0.24/19 
x= —1.816. 


The graph shows the location of the 
rejection region and the standardized test 
statistic t. Because f¢ is not in the rejection 
region, you fail to reject the null hypothesis. 


4-3 /\-1 0 1 2\3 4 
~ty=—2.101 1=-1.816 t)=2.101 


Interpretation There is not enough 
evidence at the 5% level of significance to 
reject the claim that the mean pH is 6.8. 5% Level of Significance 


> Try It Yourself 5 


The company also claims that the mean conductivity of the river is 1890 
milligrams per liter. The conductivity of a water sample is a measure of the 
total dissolved solids in the sample. You randomly select 19 water samples and 
measure the conductivity of each. The sample mean and standard deviation 
are 2500 milligrams per liter and 700 milligrams per liter, respectively. Is there 
enough evidence to reject the company’s claim at a = 0.01? Assume the 
population is normally distributed. 


. Identify the claim and state Hy) and H,. 
. Identify the level of significance a and the degrees of freedom. 
. Find the critical values —t, and ty and identify the rejection region. 
. Find the standardized test statistic t. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
Answer: Page A42 


ac 
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STUDY TIP 


Using a TI-83/84 Plus, you can 
either enter the original data 
into a list to find a P-value or 
enter the descriptive statistics. 


STAT 


Choose the TESTS menu. 
2: T-Test... 


Select the Data input option if 
you use the original data. Select 
the Stats input option if 
you use the descriptive 
statistics. In each case, 
enter the appropriate 
values including the 
corresponding type 

of hypothesis test 
indicated by the 
alternative hypothesis. 
Then select Ca/culate. 


TIl-83/84 PLUS TI-83/84 PLUS 


T-Test T-Test 
Inpt: Data u<14 
Ug:14 t=-.9035079029 
res} p=.1948994027 
ShECLS) x=138 
n:10 Sx=3.5 
LU # Ug | <Ho Peale n=10 


Calculate Draw 


>» USING P-VALUES WITH t-TESTS 


Suppose you wanted to find a P-value given t = 1.98, 15 degrees of freedom, and a 
right-tailed test. Using Table 5 in Appendix B, you can determine that P falls between 
a = 0.025 and a = 0.05, but you cannot determine an exact value for P. In such 
cases, you can use technology to perform a hypothesis test and find exact P-values. 


EXAMPLE 6 G@® Report 30 


» Using P-Values with a t-Test 
A Department of Motor Vehicles office claims that the mean wait time is less 
than 14 minutes. A random sample of 10 people has a mean wait time of 
13 minutes with a standard deviation of 3.5 minutes. At a = 0.10, test the 
office’s claim. Assume the population is normally distributed. 
> Solution 
The claim is “the mean wait time is less than 14 minutes.” So, the null and 
alternative hypotheses are 

Ho: w = 14 minutes 
and 


A, w < 14 minutes. (Claim) 


The TI-83/84 Plus display at the far left shows how to set up the hypothesis 
test. The two displays on the right show the possible results, depending on 
whether you select “Calculate” or “Draw.” 


TI-83/84 PLUS 


From the displays, you can see that P ~ 0.1949. Because the P-value is greater 
than a = 0.10, you fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 10% level of significance 
to support the office’s claim that the mean wait time is less than 14 minutes. 


> Try It Yourself 6 


Another Department of Motor Vehicles office claims that the mean wait time 
is at most 18 minutes. A random sample of 12 people has a mean wait time of 
15 minutes with a standard deviation of 2.2 minutes. At a = 0.05, test the 
office’s claim. Assume the population is normally distributed. 


a. Identify the claim and state Hy and H,. 
b. Use a TI-83/84 Plus to find the P-value. 
c. Compare the P-value with the level of significance a and make a decision. 
d. Interpret the decision in the context of the original claim. 
Answer: Page A42 
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EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain how to find critical values for a t-sampling distribution. 


2. Explain how to use a f-test to test a hypothesized mean pw given a small 
sample (n < 30). What assumption about the population is necessary? 


FOR EXTRA HELP; 


: 7} 


In Exercises 3-8, find the critical value(s) for the indicated t-test, level of 
significance a, and sample size n. 


3. Right-tailed test, a = 0.05, n = 23 4. Right-tailed test, a= 0.01, n = 11 
5. Left-tailed test, a = 0.10, 7 = 20 6. Left-tailed test, a = 0.01, n = 28 
7. Two-tailed test, a = 0.05, n = 27 8. Two-tailed test, a = 0.10, n = 22 


Graphical Analysis In Exercises 9-12, state whether the standardized test 
statistic t indicates that you should reject the null hypothesis. Explain. 


9. (a) t = 2.091 10. (a) t = 1.308 
(b) t= 0 (b) ¢ = —1.389 
(c) t = —1.08 (c) t = 1.650 
(d) 1 = 2.096 (d) 1 = —0,998 
= . -4 -3 - 2 
if; 086 aa 5 a ie a oe 
1. (a) t= —2.502 12. (a) t = 1.705 
(b) ¢ = 2.203 (b) #= 1.755 
(c) t = 2.680 (c) ¢ = —1.585 
(d) ¢ = -2.703 (d) t = 1.745 
t t 
-4-3/-2-1 0 1 2\3 4 edad adel Le 394 
—ty = —2.602 ty = 2.602 —ty=—-1.725 t)= 1.725 


In Exercises 13-16, use a t-test to test the claim about the population mean w at 
the given level of significance a using the given sample statistics. For each claim, 
assume the population is normally distributed. 


13. Claim: w = 15; a = 0.01. Sample statistics: ¥ = 13.9, s = 3.23,n = 


| 
a 


14. Claim: wp > 25; a = 0.05. Sample statistics: ¥ = 26.2, s = 2.32, n = 17 
15. Claim: w = 8000; a = 0.01. Sample statistics: ¥ = 7700, s = 450, n = 25 
16. Claim: 4 52,200; a = 0.10. Sample statistics: ¥ = 53,220, s = 2700, n = 18 
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M@ USING AND INTERPRETING CONCEPTS 


Testing Claims Jn Exercises 17-24, (a) write the claim mathematically and 
identify Hg and H,, (b) find the critical value(s) and identify the rejection 
region(s), (c) find the standardized test statistic t, (d) decide whether to reject or fail 
to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. If convenient, use technology. For each claim, assume the population 
is normally distributed. 


17. 


18. 


19, 


20. 


21. 


22. 


23. 


Used Car Cost A used car dealer says that the mean price of a 2008 Subaru 
Forester is $18,000. You suspect this claim is incorrect and find that a random 
sample of 15 similar vehicles has a mean price of $18,550 and a standard 
deviation of $1767. Is there enough evidence to reject the claim at a = 0.05? 
(Adapted from Kelley Blue Book) 


IRS Wait Times The Internal Revenue Service claims that the mean 
wait time for callers during a recent tax filing season was at most 7 minutes. 
A random sample of 11 callers has a mean wait time of 8.7 minutes and a 
standard deviation of 2.7 minutes. Is there enough evidence to reject the 
claim at a = 0.10? (Adapted from Internal Revenue Service) 


Work Hours A medical board claims that the mean number of hours 
worked per week by surgical faculty who teach at an academic institution 
is more than 60 hours. The hours worked include teaching hours as well as 
regular working hours. A random sample of 7 surgical faculty has a mean 
hours worked per week of 70 hours and a standard deviation of 12.5 hours. 
At a = 0.05, do you have enough evidence to support the board’s claim? 
(Adapted from Journal of the American College of Surgeons) 


Battery Life A company claims that the mean battery life of their MP3 
player is at least 30 hours. You suspect this claim is incorrect and find that a 
random sample of 18 MP3 players has a mean battery life of 28.5 hours and 
a standard deviation of 1.7 hours. Is there enough evidence to reject the claim 
at a = 0.01? 


Waste Recycled An environmentalist estimates that the mean amount 
of waste recycled by adults in the United States is more than 1 pound 
per person per day. You want to test this claim. You find that the mean 
waste recycled per person per day for a random sample of 13 adults in the 
United States is 1.50 pounds and the standard deviation is 0.28 pound. At 
a = 0.10, can you support the claim? (Adapted from U.S. Environmental 
Protection Agency) 


Waste Generated As part of your work for an environmental awareness 
group, you want to test a claim that the mean amount of waste generated 
by adults in the United States is more than 4 pounds per day. In a random 
sample of 22 adults in the United States, you find that the mean waste 
generated per person per day is 4.50 pounds with a standard deviation of 
1.21 pounds. At a = 0.01, can you support the claim? (Adapted from 
U.S. Environmental Protection Agency) 


Annual Pay An employment information service claims the mean annual 
salary for full-time male workers over age 25 and without a high school 
diploma is $26,000. The annual salaries for a random sample of 10 full-time 
male workers without a high school diploma are listed. At a = 0.05, test the 
claim that the mean salary is $26,000. (Adapted from U.S. Bureau of Labor 
Statistics) 


26,185 23,814 22,374 25,189 26,318 
20,767 30,782 29,541 24,597 28,955 
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24. 


Annual Pay An employment information service claims the mean annual 
salary for full-time female workers over age 25 and without a high school 
diploma is more than $18,500. The annual salaries for a random sample of 
12 full-time female workers without a high school diploma are listed. At 
a = 0.10, is there enough evidence to support the claim that the mean salary 
is more than $18,500? (Adapted from U.S. Bureau of Labor Statistics) 


18,665 16,312 18,794 19,403 20,864 19,177 
17,328 21,445 20,354 19,143 18,316 19,237 


Testing Claims Using P-Values Jn Exercises 25-30, (a) write the claim 
mathematically and identify Hy and H,, (b) use technology to find the P-value, 
(c) decide whether to reject or fail to reject the null hypothesis, and (d) interpret the 
decision in the context of the original claim. Assume the population is normally 
distributed. 


25. 


Speed Limit A county is considering raising the speed limit on a road 
because they claim that the mean speed of vehicles is greater than 45 miles 
per hour. A random sample of 25 vehicles has a mean speed of 48 miles per 
hour and a standard deviation of 5.4 miles per hour. At a = 0.10, do you 
have enough evidence to support the county’s claim? 


. Oil Changes A repair shop believes that people travel more than 3500 


miles between oil changes. A random sample of 8 cars getting an oil change 
has a mean distance of 3375 miles since having an oil change with a standard 
deviation of 225 miles. At a = 0.05, do you have enough evidence to support 
the shop’s claim? 


. Meal Cost A travel association claims that the mean daily meal cost for two 


adults traveling together on vacation in San Francisco is $105. A random 
sample of 20 such groups of adults has a mean daily meal cost of $110 and a 
standard deviation of $8.50. Is there enough evidence to reject the claim at 
a = 0.01? (Adapted from American Automobile Association) 


. Lodging Cost A travel association claims that the mean daily lodging 


cost for two adults traveling together on vacation in San Francisco is at 
least $240. A random sample of 24 such groups of adults has a mean daily 
lodging cost of $233 and a standard deviation of $12.50. Is there enough 
evidence to reject the claim at a = 0.10? (Adapted from American Automobile 
Association) 


. Class Size You receive a brochure from a large university. The brochure 


indicates that the mean class size for full-time faculty is fewer than 
32 students. You want to test this claim. You randomly select 18 classes taught 
by full-time faculty and determine the class size of each. The results are 
listed below. At a = 0.05, can you support the university’s claim? 


35 28 29 33 32 40 26 25 29 
28 30 36 33 29 27 30 28 25 


. Faculty Classroom Hours The dean of a university estimates that the mean 


number of classroom hours per week for full-time faculty is 11.0. As a 
member of the student council, you want to test this claim. A random sample 
of the number of classroom hours for eight full-time faculty for one week is 
listed below. At a = 0.01, can you reject the dean’s claim? 


11.8 86 12.6 7.9 64 10.4 13.6 9.1 
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In Exercises 31-34, use StatCrunch and a t-test to help you test the claim 
about the population mean py at the given level of significance a using the given 
sample statistics. For each claim, assume the population is normally distributed. 


31. 
32. 
33. 
34. 


35. 


36. 


Claim: w = 75; a = 0.05. Sample statistics: ¥ = 73.6, s = 3.2, n = 26 
Claim: w # 27; a = 0.01. Sample statistics: ¥ = 31.5, s = 4.7, n = 12 
Claim: w < 188; a = 0.05. Sample statistics: ¥ = 186, s = 12,n = 9 
Claim: w = 2118; a = 0.10. Sample statistics: ¥ = 1787, s = 384, n =17 


EXTENDING CONCEPTS 


Credit Card Balances To test the claim that the mean credit card debt 
for individuals is greater than $5000, you do some research and find that 
a random sample of 6 cardholders has a mean credit card balance of 
$5434 with a standard deviation of $625. You conduct a statistical experiment 
where Hp: w = $5000 and H,: uw > $5000. At a = 0.05, explain why you 
cannot reject Hj. Assume the population is normally distributed. (Adapted 
from TransUnion) 


Using Different Values of a and n_ In Exercise 35, you believe that Ho 
is not valid. Which of the following allows you to reject Hj? Explain your 
reasoning. 

(a) Use the same values but decrease a from 0.05 to 0.01. 

(b) Use the same values but increase a from 0.05 to 0.10. 

(c) Use the same values but increase n from 6 to 8. 


(d) Use the same values but increase n from 6 to 24. 


Deciding on a Distribution Jn Exercises 37 and 38, decide whether you 
should use a normal sampling distribution or a t-sampling distribution to perform 
the hypothesis test. Justify your decision. Then use the distribution to test the claim. 
Write a short paragraph about the results of the test and what you can conclude 
about the claim. 


37. 


38. 


39. 


Gas Mileage A car company says that the mean gas mileage for its luxury 
sedan is at least 23 miles per gallon (mpg). You believe the claim is incorrect 
and find that a random sample of 5 cars has a mean gas mileage of 
22 mpg and a standard deviation of 4 mpg. At a = 0.05, test the company’s 
claim. Assume the population is normally distributed. 


Private Law School An education publication claims that the average 
in-state tuition for one year of law school at a private institution is more than 
$35,000. A random sample of 50 private law schools has a mean in-state 
tuition of $34,967 and a standard deviation of $5933 for one year. At 
a = 0.01, test the publication’s claim. Assume the population is normally 
distributed. (Adapted from U.S. News and World Report) 


Writing You are testing a claim and incorrectly use the normal sampling 
distribution instead of the t-sampling distribution. Does this make it more or 
less likely to reject the null hypothesis? Is this result the same no matter 
whether the test is left-tailed, right-tailed, or two-tailed? Explain your 
reasoning. 
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APPLET 


ADBRET 


Hypothesis Tests for a Mean 


The hypothesis tests for a mean applet allows you to visually investigate 
hypothesis tests for a mean. You can specify the sample size n, the shape of the 
distribution (Normal or Right skewed), the true population mean (Mean), the true 
population standard deviation (Std. Dev.), the null value for the mean (Null mean), 
and the alternative for the test (Alternative). When you click SIMULATE, 
100 separate samples of size n will be selected from a population with these 
population parameters. For each of the 100 samples, a hypothesis test based on 
the T statistic is performed, and the results from each test are displayed in the 
plots at the right. The test statistic for each test is shown in the top plot and the 
P-value is shown in the bottom plot. The green and blue lines represent the cutoffs 
for rejecting the null hypothesis with the 0.05 and 0.01 level tests, respectively. 
Additional simulations can be carried out by clicking SIMULATE multiple 
times. The cumulative number of times that each test rejects the null hypothesis 
is also shown. Press CLEAR to clear existing results and start a new simulation. 


m Explore a 
Distribution: |Normal 

Step 1 Specify a value for n. Mean: |50 

Step 2 Specify a distribution. Std. Dev.: [10 

Step 3 Specify a value for the mean. Null mean: [50 

Step 4 Specify a value for the Alternative: |< 
standard deviation. Stiramiire | 

Step 5 Specify a value for the 
null mean. Cumulative results: 

Step 6 Specify an alternative ANSievel Obiavel 
hypothesis. Reject null 


Step 7 Click SIMULATE to 
generate the hypothesis tests. 


Fail to reject null 


Prop. rejected 


Clear | 


= Draw Conclusions 


1. Set n = 15, Mean = 40, Std. Dev. = 5, Null mean = 40, alternative hypothesis 
to “not equal,” and the distribution to “Normal.” Run the simulation so 
that at least 1000 hypothesis tests are run. Compare the proportion of null 
hypothesis rejections for the 0.05 level and the 0.01 level. Is this what you 
would expect? Explain. 


2. Suppose a null hypothesis is rejected at the 0.01 level. Will it be rejected at 
the 0.05 level? Explain. Suppose a null hypothesis is rejected at the 0.05 level. 
Will it be rejected at the 0.01 level? Explain. 


3. Set n = 25, Mean = 25, Std. Dev. = 3, Null mean = 27, alternative hypothesis 
to “<,” and the distribution to “Normal.” What is the null hypothesis? Run the 
simulation so that at least 1000 hypothesis tests are run. Compare the 
proportion of null hypothesis rejections for the 0.05 level and the 0.01 level. 
Is this what you would expect? Explain. 


398 CHAPTER 7 


7.4 


WHAT YOU SHOULD LEARN 


» How to use the z-test to test 
a population proportion p 


INSIGHT 


A hypothesis test for a proportion 
p can also be performed using 
P-values. Use the guidelines on 
page 373 for using P-values 

for a z-test for a mean p, 

but in Step 3 find the 
standardized test 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


Hypothesis Testing for Proportions 


Hypothesis Test for Proportions 


>» HYPOTHESIS TEST FOR PROPORTIONS 


In Sections 7.2 and 7.3, you learned how to perform a hypothesis test for a 
population mean. In this section, you will learn how to test a population 
proportion p. 

Hypothesis tests for proportions can be used when politicians want to know 
the proportion of their constituents who favor a certain bill or when quality 
assurance engineers test the proportion of parts that are defective. 

If np =5 and nq =5 for a binomial distribution, then the sampling 
distribution for p is approximately normal with a mean of 


Mp P 
and a standard error of 


o5= Vpq/n. 


zZ-TEST FOR A PROPORTION p 


The z-test for a proportion is a statistical test for a population proportion p. 
The z-test can be used when a binomial distribution is given such that np = 5 
and nq = 5. The test statistic is the sample proportion p and the standardized 
test statistic is 


P-Mp p-p 


NX 
| 
Q 
» 
| 
3 
S 
| 
S| 


GUIDELINES 


Using a z-Test for a Proportion p 
Verify that np = 5 and ng = S. 


IN WORDS 


1. State the claim mathematically 
and verbally. Identify the null 
and alternative hypotheses. 


IN SYMBOLS 
State Hp and H,. 


statistic by using the 


formula 
a j= fo) 
Vpq/n 


The other steps in the 
test are the same. 


and sketch the sampling distribution. 
6. Make a decision to reject or fail to 


reject the null hypothesis. 


7. Interpret the decision in the context 


of the original claim. 
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2. Specify the level of significance. Identify a. 

3. Determine the critical value(s). Use Table 4 in Appendix B. 
4, Determine the rejection region(s). 

5. Find the standardized test statistic B= fe 


V pq/n 


If zis in the rejection region, 
reject Hy. Otherwise, fail to 
reject Ho. 


»), To explore this topic fu 


rther, 


see Activity 7.4 on page 403. 


STUDY TIP 


Remember that if the sampl 
proportion is not given, you 
can find it using 

eee 

Ph 
where x is the number 
of successes in the 
sample and n is the 
sample size. 


STUDY TIP 


(3 


Remember that when you fail 


to reject Ho, a type Il error 
is possible. For instance, 

in Example 1 the null 
hypothesis, p = 0.5, 

may be false. 


€ 
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See TI-83/84 Plus 
EXAMPLE 1 steps on page 425. 


>» Hypothesis Test for a Proportion 


A research center claims that less than 50% of U.S. adults have accessed the 
Internet over a wireless network with a laptop computer. In a random sample 
of 100 adults, 39% say they have accessed the Internet over a wireless network 
with a laptop computer. At a = 0.01, is there enough evidence to support the 
researcher’s claim? (Adapted from Pew Research Center) 


> Solution The products np = 100(0.50) = 50 and nq = 100(0.50) = 50 
are both greater than 5. So, you can use a z-test. The claim is “less than 50% 
have accessed the Internet over a wireless network with a laptop computer.” 
So, the null and alternative hypotheses are 


Ho: p = 0.5 and H,: p < 0.5. (Claim) 


Because the test is a left-tailed test and the level of significance is a = 0.01, 
the critical value is z) = —2.33 and the rejection region is z < —2.33. The 
standardized test statistic is 


P—~?P 
Z= Because np = 5 and ng = 5, you can use the z-test. 
Vpq/n 
0.39 — 0.5 
= Assume p = 0.5. 
\/(0.5)(0.5)/100 
= —2,2. 


The graph shows the location of the 
rejection region and the standardized test 
statistic z. Because z is not in the rejection z,=-233 %=-2.2 

: : : Oro 22 eee 
region, you should fail to reject the null 
hypothesis. 


1% Level of Significance 


Interpretation There is not enough evidence at the 1% level of significance 
to support the claim that less than 50% of U.S. adults have accessed the 
Internet over a wireless network with a laptop computer. 


> Try It Yourself 1 


A research center claims that more than 25% of U.S. adults have used a cellular 
phone to access the Internet. In a random sample of 125 adults, 32% say they 
have used a cellular phone to access the Internet. At a = 0.05, is there enough 
evidence to support the researcher’s claim? (Adapted from Pew Research Center) 


. Verify that np = 5 and ng = 5. 
. Identify the claim and state Hp and H,. 
. Identify the level of significance a. 
. Find the critical value z) and identify the rejection region. 
. Find the standardized test statistic z. Sketch a graph. 

Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 

Answer: Page A42 


wr moann & & 


To use a P-value to perform the hypothesis test in Example 1, use Table 4 
to find the area corresponding to z = —2.2. The area is 0.0139. Because this is a 
left-tailed test, the P-value is equal to the area to the left of z = —2.2. So, 
P = 0.0139. Because the P-value is greater than a = 0.01, you should fail to reject 
the null hypothesis. Note that this is the same result obtained in Example 1. 
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A recent survey claimed that at 
least 70% of U.S. adults believe 
that cloning animals is morally 
wrong. To test this claim, you 
conduct a random telephone 
survey of 300 U.S. adults. In the 
survey, you find that 189 adults 
believe that cloning animals is 
morally wrong. (Adapted from The 
Gallup Poll) 


Cloning —_ Cloning animals 
animals is is morally 
morally wrong _ acceptable 


WR 


At a = 0.05, is there enough 
evidence to reject the claim? 


EXAMPLE 2 


See MINITAB 
steps on page 424. 


>» Hypothesis Test for a Proportion 


A research center claims that 25% of college graduates think a college degree 
is not worth the cost. You decide to test this claim and ask a random sample of 
200 college graduates whether they think a college degree is not worth the 
cost. Of those surveyed, 21% reply yes. At a = 0.10, is there enough evidence 
to reject the claim? (Adapted from Zogby International) 


> Solution 

The products np = 200(0.25) = 50 and nq = 200(0.75) = 150 are both greater 
than 5. So, you can use a z-test. The claim is “25% of college graduates think a 
college degree is not worth the cost.” So, the null and alternative hypotheses 
are 


Ho: p = 0.25 (Claim) and H,: p # 0.25. 


Because the test is a two-tailed test and the level of significance is a = 0.10, 
the critical values are —z) = —1.645 and zy = 1.645. The rejection regions are 
zZ < —1.645 and z > 1.645. The standardized test statistic is 


P-pP 
= Because np = 5 and ng = 5, you can use the z-test. 
V pq/n 
0.21 — 0.25 
= Assume p = 0.25. 
/ (0.25) (0.75) /200 
= -131. 


The graph shows the location of the 
rejection regions and the standardized 
test statistic z. Because z is not in the 
rejection region, you should fail to reject 
the null hypothesis. 


Interpretation There is not enough 
evidence at the 10% level of significance 
to reject the claim that 25% of college 
graduates think a college degree is not 
worth the cost. 


> Try It Yourself 2 


A research center claims that 30% of U.S. adults have not purchased a certain 
brand because they found the advertisements distasteful. You decide to test 
this claim and ask a random sample of 250 U.S. adults whether they have not 
purchased a certain brand because they found the advertisements distasteful. 
Of those surveyed, 36% reply yes. At a = 0.10, is there enough evidence to 
reject the claim? (Adapted from Harris Interactive) 


oh as aes 0 12 3 4 
z=-131 
10% Level of Significance 


. Verify that np = 5 and ng = 5. 
. Identify the claim and state Hp and H,. 
. Identify the level of significance a. 
. Find the critical values —zy and Zp and identify the rejection regions. 
. Find the standardized test statistic z. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 
Answer: Page A42 


memonanan se & 
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ye) EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain how to decide when a normal distribution can be used to approximate 
a binomial distribution. 


FOR EXTRA HELP; 2. Explain how to test a population proportion p. 


s Fy : 


In Exercises 3-8, decide whether the normal sampling distribution can be used. If 
it can be used, test the claim about the population proportion p at the given level of 


significance a using the given sample statistics. 
3. Claim: p < 0.12; a = 0.01. Sample statistics: p = 0.10, n = 40 
. Claim: p = 0.48; a = 0.08. Sample statistics: p = 0.40, n = 90 


. Claim: p # 0.15; a = 0.05. Sample statistics: p = 0.12, n = 500 


4 
5 

6. Claim: p > 0.70; a = 0.04. Sample statistics: p = 0.64, n = 225 
7. Claim: p <= 0.45; a = 0.05. Sample statistics: p = 0.52, n = 100 
8 


. Claim: p = 0.95; a = 0.10. Sample statistics: p = 0.875, n = 50 


M@ USING AND INTERPRETING CONCEPTS 


Testing Claims In Exercises 9-16, (a) write the claim mathematically and 
identify Hy and H,, (b) find the critical value(s) and identify the rejection 
region(s), (c) find the standardized test statistic z, (d) decide whether to reject or 
fail to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. If convenient, use technology to find the standardized test statistic. 


9. Smokers A medical researcher says that less than 25% of U.S. adults are 
smokers. In a random sample of 200 U.S. adults, 18.5% say that they 
are smokers. At a = 0.05, is there enough evidence to reject the researcher’s 
claim? (Adapted from National Center for Health Statistics) 


10. Census A research center claims that at least 40% of U.S. adults think the 
Census count is accurate. In a random sample of 600 U.S. adults, 35% say that 
the Census count is accurate. At a = 0.02, is there enough evidence to reject 
the center’s claim? (Adapted from Rasmussen Reports) 


11. Cellular Phones and Driving A research center claims that at most 50% 
of people believe that drivers should be allowed to use cellular phones 
with hands-free devices while driving. In a random sample of 150 US. 
adults, 58% say that drivers should be allowed to use cellular phones with 
hands-free devices while driving. At a = 0.01, is there enough evidence to 
reject the center’s claim? (Adapted from Rasmussen Reports) 


12. Asthma A medical researcher claims that 5% of children under 18 years of 
age have asthma. In a random sample of 250 children under 18 years of age, 
9.6% say they have asthma. At a = 0.08, is there enough evidence to reject 
the researcher’s claim? (Adapted from National Center for Health Statistics) 


13. Female Height A research center claims that more than 75% of females 
ages 20-29 are taller than 62 inches. In a random sample of 150 females ages 
20-29, 82% are taller than 62 inches. At a = 0.10, is there enough evidence 
to support the center’s claim? (Adapted from National Center for Health Statistics) 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


14. Curling A research center claims that 16% of U.S. adults say that curling 


15. 


16. 


is the Winter Olympic sport they would like to try the most. In a random 
sample of 300 U.S. adults, 20% say that curling is the Winter Olympic sport 
they would like to try the most. At a = 0.05, is there enough evidence to 
reject the researcher’s claim? (Adapted from Zogby International) 


Dog Ownership A humane society claims that less than 35% of 
US. households own a dog. In a random sample of 400 U.S. households, 
156 say they own a dog. At a = 0.10, is there enough evidence to support 
the society’s claim? (Adapted from The Humane Society of the United States) 


Cat Ownership A humane society claims that 30% of U.S. households own 
a cat. In a random sample of 200 U.S. households, 72 say they own a cat. At 
a = 0.05, is there enough evidence to reject the society’s claim? (Adapted 
from The Humane Society of the United States) 


Free Samples Jn Exercises 17 and 
18, use the graph, which shows what 
adults think about the effectiveness of 


free samples. 


17. Do Free Samples Work? You Shouldnitdent 


18. 


How effective adults 
say free samples are: 


interview a random sample of 50 3% 
adults. The results of the survey _ | Nice, but not necessary 
show that 48% of the adults said 25% 
they were more likely to buy a More likely to 
product when there are free See Te) reduc 
samples. At a = 0.05, can you : 
reject the claim that at least 52% 

of adults are more likely to buy 

a product when there are free 

samples? 


Should Free Samples Be Used? Use your conclusion from Exercise 17 to 
write a paragraph on the use of free samples. Do you think a company should 
use free samples to get people to buy a product? Explain. 


M@ EXTENDING CONCEPTS 


Alternative Formula Jn Exercises 19 and 20, use the following information. 
When you know the number of successes x, the sample size n, and the population 
proportion p, it can be easier to use the formula 


to 


x — np 
VNPq 


find the standardized test statistic when using a z-test for a population 


proportion p. 


19 
20 


. Rework Exercise 15 using the alternative formula and compare the results. 


. The alternative formula is derived from the formula 


p-p _(x/n)~p 


© \/pqin—-\/pqin 


Use this formula to derive the alternative formula. Justify each step. 
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ADBRBET 


AQBBET 


Hypothesis Tests for a Proportion 


The hypothesis tests for a proportion applet allows you to visually investigate 
hypothesis tests for a population proportion. You can specify the sample size n, 
the population proportion (True p), the null value for the proportion (Null p), 
and the alternative for the test (Alternative). When you click SIMULATE, 100 
separate samples of size n will be selected from a population with a proportion 
of successes equal to True p. For each of the 100 samples, a hypothesis test based 
on the Z statistic is performed, and the results from each test are displayed in 
plots at the right. The standardized test statistic for each test is shown in the top 
plot and the P-value is shown in the bottom plot. The green and blue lines 
represent the cutoffs for rejecting the null hypothesis with the 0.05 and 0.01 level 
tests, respectively. Additional simulations can be carried out by clicking 
SIMULATE multiple times. The cumulative number of times that each test 
rejects the null hypothesis is also shown. Press CLEAR to clear existing results 
and start a new simulation. 


n:/100 

True p:|0.5 

Null p:|0.5 
Alternative: |< 


Simulate | 


Cumulative results: 


0.05 level 0.01 level 
Reject null 
Fail to reject null 


Prop. rejected 


Clear | 


m Explore 


Step 1 Specify a value for n. 

Step 2 Specify a value for True p. 

Step 3 Specify a value for Null p. 

Step 4 Specify an alternative hypothesis. 

Step 5 Click SIMULATE to generate the hypothesis tests. 


= Draw Conclusions 


1. Set n = 25, True p = 0.35, Null p = 0.35, and the alternative hypothesis to 
“not equal.” Run the simulation so that at least 1000 hypothesis tests are run. 
Compare the proportion of null hypothesis rejections for the 0.05 level and the 
0.01 level. Is this what you would expect? Explain. 


2. Set n = 50, True p = 0.6, Null p = 0.4, and the alternative hypothesis to 
“<<.” What is the null hypothesis? Run the simulation so that at least 1000 
hypothesis tests are run. Compare the proportion of null hypothesis rejections 
for the 0.05 level and the 0.01 level. Perform a hypothesis test for each level. 
Use the results of the hypothesis tests to explain the results of the simulation. 


Presented by: https://jafrilibrary.org 


404 CHAPTER 7 HYPOTHESIS TESTING WITH ONE SAMPLE 


yoy Hypothesis Testing for Variance and Standard Deviation 


WHAT YOU SHOULD LEARN Critical Values for a x*-Test >» The Chi-Square Test 


> How to find critical values for >» CRITICAL VALUES FOR A y?-TEST 


2 
saa In real life, it is often important to produce consistent predictable results. For 
> How to use the x*-test to instance, consider a company that manufactures golf balls. The manufacturer 
test a variance or a standard must produce millions of golf balls, each having the same size and the same 
deviation weight. There is a very low tolerance for variation. If the population is normal, 


you can test the variance and standard deviation of the process using the 
chi-square distribution with n — 1 degrees of freedom. 


ca 


GUIDELINES 


Finding Critical Values for the ve -Test 
1. Specify the level of significance a. 
2. Determine the degrees of freedom d.f. = n — 1. 


3. The critical values for the x?-distribution are found in Table 6 in 
> Appendix B. To find the critical value(s) for a 


Critical 


value eA a. right-tailed test, use the value that corresponds to d.f. and a. 


b. left-tailed test, use the value that corresponds to d.f. and 1 — a. 


{ c. two-tailed test, use the values that correspond to df. and 5a, and df. 
and 1 — ja. 


ie 2 EXAMPLE 1 
Critical 7 


value 4; > Finding Critical Values for x? 
Find the critical y*-value for a right-tailed test when n = 26 and a = 0.10. 


> Solution 
The degrees of freedom are 


dfi=n-1=26-1=25. 
a=0.10 


The graph at the right shows a 


p 


Critical Casale a x’-distribution with 25 degrees of fo i? 
value 12 value x2 freedom and a shaded area of a = 0.10 in 5 10 15 20 25 30 /35 40 45 
. z the right tail. In Table 6 in Appendix B XG, = 34.382 
with d.f. = 25 and a = 0.10, the critical 
value is 
Xo = 34.382. 


> Try It Yourself 1 
Find the critical y*-value for a right-tailed test when n = 18 and a = 0.01. 


a. Identify the degrees of freedom and the level of significance. 
b. Use Table 6 in Appendix B to find the critical y7-value. Answer: Page AZ2 
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EXAMPLE 2 


> Finding Critical Values for x2 
Find the critical y7-value for a left-tailed test when n = 11 and a = 0.01. 


> Solution 
The degrees of freedom are A 


df =n-1=11-1=10. 


The graph shows a y*-distribution with 
10 degrees of freedom and a shaded area 
of a = 0.01 in the left tail. The area to the 


right of the critical value is a=0.01 
1-a=1- 0.01 = 0.99. ae ee 
In Table 6 with d.f=10 and the % =2.558 
STUDY TIP area 1 — a = 0.99, the critical value is 
Note that because chi-square Xo = 2.558. 


distributions are not symmetric 

(like normal or t-distributions), > Try It Yourself 2 
in a two-tailed test the two 
critical values are not 
opposites. Each critical 
value must be calculated 
separately. 


Find the critical y?-value for a left-tailed test when n = 30 and a = 0.05. 


a. Identify the degrees of freedom and the level of significance. 
b. Use Table 6 in Appendix B to find the critical y7-value. — Answer: Page A42 


= EXAMPLE 3 


> Finding Critical Values for x2 
Find the critical y7-values for a two-tailed test when n = 9 and a = 0.05. 


> Solution 
The degrees of freedom are 4 
dfi=n-1=9-1=8. 


The graph shows a y’-distribution with 
8 degrees of freedom and a shaded area 
of sa = 0.025 in each tail. The areas to 
the right of the critical values are 


Sa = 0.025 


and 


1 — Sa = 0.975. 
In Table 6 with d.f. = 8 and the areas 0.025 and 0.975, the critical values are 
Xz, = 2.180 and yz = 17.535. 
> Try It Yourself 3 
Find the critical y7-values for a two-tailed test when n = 51 and @ = 0.01. 
a. Identify the degrees of freedom and the level of significance. 
b. Find the first critical value yz using Table 6 in Appendix B and the area Sa. 


c. Find the second critical value y; using Table 6 in Appendix B and the area 
1 - Sa. Answer: Page A42 
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> THE CHI-SQUARE TEST 


To test a variance o” or a standard deviation o of a population that is normally 
distributed, you can use the y*-test. The x?-test for a variance or standard 
deviation is not as robust as the tests for the population mean yp or the population 
proportion p. So, it is essential in performing a x7-test for a variance or standard 
deviation that the population be normally distributed. The results can be misleading 
if the population is not normal. 


y?-TEST FOR A VARIANCE o? OR STANDARD 
DEVIATION o 


The y*-test for a variance or standard deviation is a statistical test for a 
population variance or standard deviation. The y*-test can be used when 
the population is normal. The test statistic is s7 and the standardized test 
statistic 


(n — 1)s? 
xX oe 
follows a chi-square distribution with degrees of freedom 


Gh, = i = te 


GUIDELINES 


Using the y?-Test for a Variance or Standard Deviation 


an & BW NM 


IN WORDS 


. State the claim mathematically 


and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. 
. Determine the degrees of freedom. 
. Determine the critical value(s). 


. Determine the rejection region(s). 


. Find the standardized test statistic 


and sketch the sampling 
distribution. 


. Make a decision to reject or fail to 


reject the null hypothesis. 


. Interpret the decision in the context 


of the original claim. 
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IN SYMBOLS 
State Hp and H,. 


Identify a. 
(<0 edt ie 
Use Table 6 in Appendix B. 


(n — 1)s* 
KS m 


If y’ is in the rejection region, 
reject Hy. Otherwise, fail to 
reject Hy. 
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EXAMPLE 4 G9 Report 31 


> Using a Hypothesis Test for the Population Variance 


A dairy processing company claims that the variance of the amount of fat in 
the whole milk processed by the company is no more than 0.25. You suspect 
A community center claims that this is wrong and find that a random sample of 41 milk containers has a 
the chlorine level in its pool has a variance of 0.27. At a = 0.05, is there enough evidence to reject the company’s 


staneard dovsner of Oe Patt claim? Assume the population is normally distributed. 
per million (ppm). A sampling 


of the pool's chlorine levels at > Solution 
25 random times during a month 
yields a standard deviation of 
0.61 ppm. (Adapted from American 


The claim is “the variance is no more than 0.25.” So, the null and alternative 
hypotheses are 


Poel! SUppIy) Ho: 07 < 0.25 (Claim) and  4H,:0? > 0.25. 
f The test is a right-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are d.f. = 41 — 1 = 40. So, the critical value is 

S 3 Xo = 55.758. 

= 2 The rejection region is y* > 55.758. The standardized test statistic is 

ay g., = 1)s? 

x= 7 Use the chi-square test. 
oO 


TEOMA S22 226310) 
Chlorine level (ppm) = (41 ~ 1)(0.27) Assume o2 = 0.25. 


0.25 
= 43.2. 


At a = 0.05, is there enough 
evidence to reject the claim? 


The graph shows the location of the 4 
rejection region and the standardized test 
statistic y’. Because y* is not in the 
rejection region, you should fail to reject 
the null hypothesis. 


Interpretation There is not enough a = 0.05 
evidence at the 5% level of significance to 
reject the company’s claim that the 
variance of the amount of fat in the whole 
milk is no more than 0.25. 


> Try It Yourself 4 


A bottling company claims that the variance of the amount of sports drink in 
a 12-ounce bottle is no more than 0.40. A random sample of 31 bottles has a 
variance of 0.75. At a = 0.01, is there enough evidence to reject the company’s 
claim? Assume the population is normally distributed. 


10 20 30 40, so \60 70 
X? 243.2 Xi, = 55.758 


. Identify the claim and state Hy and H,. 
. Identify the level of significance a and the degrees of freedom. 
Find the critical value and identify the rejection region. 
Find the standardized test statistic x. 
. Decide whether to reject the null hypothesis. Use a graph if necessary. 
Interpret the decision in the context of the original claim. 
Answer: Page A42 


mean sp 
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STUDY TIP 


Although you are testing 
a standard deviation in 
Example 5, the y2-statistic 
requires variances. 

Don't forget to square 
the given standard 
deviations to calculate 
these variances. 
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EXAMPLE 5 Report 32 


» Using a Hypothesis Test for the Standard Deviation 

A company claims that the standard deviation of the lengths of time it takes 
an incoming telephone call to be transferred to the correct office is less than 
1.4 minutes. A random sample of 25 incoming telephone calls has a standard 
deviation of 1.1 minutes. At a = 0.10, is there enough evidence to support the 
company’s claim? Assume the population is normally distributed. 


> Solution 


The claim is “the standard deviation is less than 1.4 minutes.” So, the null and 
alternative hypotheses are 


Hy:o =14minutes and 4A,:o0 < 14 minutes. (Claim) 


The test is a left-tailed test, the level of significance is a = 0.10, and the 
degrees of freedom are 


d.f.= 25-1 
= 24. 
So, the critical value is 
i, = 15.659. 
The rejection region is y* < 15.659. The standardized test statistic is 
x oa ne Use the chi-square test. 


25 — 1)(1.1)" 
= ( M ) Assume o = 1.4. 


1a? 
~ 14.816. 


The graph shows the location of the rejection 4} 
region and the standardized test statistic y’. 
Because x” is in the rejection region, you 
should reject the null hypothesis. 


Interpretation There is enough evidence 
at the 10% level of significance to support 
the claim that the standard deviation of 
the lengths of time it takes an incoming 
telephone call to be transferred to the 
correct office is less than 1.4 minutes. 


> Try It Yourself 5 


A police chief claims that the standard deviation of the lengths of response times 
is less than 3.7 minutes. A random sample of 9 response times has a standard 
deviation of 3.0 minutes. At a = 0.05, is there enough evidence to support the 
police chief’s claim? Assume the population is normally distributed. 


2 


1 1 T 1 Tt 7s 
5 10 20 25 30 35 40 
X> = 14.816 X5 = 15.659 


. Identify the claim and state Hy) and H,. 

. Identify the level of significance a and the degrees of freedom. 

. Find the critical value and identify the rejection region. 

. Find the standardized test statistic x’. 

. Decide whether to reject the null hypothesis. Use a graph if necessary. 

. Interpret the decision in the context of the original claim. 

Answer: Page A42 


=emoaem & & 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 
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EXAMPLE 6 


» Using a Hypothesis Test for the Population Variance 
A sporting goods manufacturer claims that the variance of the strengths 
of a certain fishing line is 15.9. A random sample of 15 fishing line spools has 
a variance of 21.8. At a = 0.05, is there enough evidence to reject the 
manufacturer’s claim? Assume the population is normally distributed. 
> Solution 
The claim is “the variance is 15.9.” So, the null and alternative hypotheses are 

Hy: 07 = 15.9 (Claim) 
and 

Figo # 158; 
The test is a two-tailed test, the level of significance is a = 0.05, and the 
degrees of freedom are 

df.=15-1 

= 14. 

So, the critical values are xe = 5.629 and res = 26.119. 


The rejection regions are x7 < 5.629 and y* > 26.119. The standardized test 
statistic is 


7 (a- 1)s* 
a a Use the chi-square test. 
o 
(15 — 1)(21.8) i 
= 59 Assume o~ = 15.9. 
~ 19.195. 


The graph shows the location of the 4 
rejection regions and the standardized test 
statistic y’. Because y* is not in the 
rejection regions, you should fail to reject 
the null hypothesis. 


Interpretation There is not enough 1 y= 0,025 
evidence at the 5% level of significance to | ‘ 
reject the claim that the variance of the 
strengths of the fishing line is 15.9. 


> Try It Yourself 6 


A company that offers dieting products and weight loss services claims 
that the variance of the weight losses of their users is 25.5. A random 
sample of 13 users has a variance of 10.8. At a = 0.10, is there enough 
evidence to reject the company’s claim? Assume the population is normally 


s\ 10 15 /20  25\ 30 
12 =5.629 x? ~ 19.195 x2 = 26.119 


distributed. 

a. Identify the claim and state Hy and H,. 

b. Identify the level of significance a and the degrees of freedom. 

c. Find the critical values and identify the rejection regions. 

d. Find the standardized test statistic x’. 

e. Decide whether to reject the null hypothesis. Use a graph if necessary. 
f. Interpret the decision in the context of the original claim. 


Answer: Page A42 
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EXERCISES 


FOR EXTRA HELP; 


3 A 


a 
i 
2. 
3. 


4. 


BUILDING BASIC SKILLS AND VOCABULARY 


Explain how to find critical values in a y?- sampling distribution. 
Can a critical value for the y?-test be negative? Explain. 


When testing a claim about a population mean or a population standard 
deviation, a requirement is that the sample is from a population that is nor- 
mally distributed. How is this requirement different between the two tests? 


Explain how to test a population variance or a population standard deviation. 


In Exercises 5—10, find the critical value(s) for the indicated test for a population 
variance, sample size n, and level of significance a. 


5. Right-tailed test, 6. Right-tailed test, 
n = 27,a = 0.05 n= 10,a = 0.10 

7. Left-tailed test, 8. Left-tailed test, 
n=T,a= 0.01 n = 24,a = 0.05 

9. Two-tailed test, 10. Two-tailed test, 
n= 8l,a = 0.10 n= 61,a = 0.01 


Graphical Analysis Jn Exercises 11-14, state whether the standardized test 
statistic ? allows you to reject the null hypothesis. 


11. 


13. 


(a) x* = 2.091 12) (a) 4° = 0771 
(b) x =0 (b) x? = 9.486 
(c) x* = 1.086 (c) x? = 0.701 
(d) x? = 6.3471 (d) x? = 9.508 
a : 
2 4 6\ 8 10 /2 4 6 8 fw 2 
t= 6251 Xi =0711 — X;, = 9.488 
(a) y* = 22.302 14, (a) x? = 10.065 
(b) x? = 23.309 (b) y* = 10.075 
(c) x* = 8.457 (c) y* = 10.585 
(d) x* = 8.577 (d) yx” = 10.745 
A 
C 2 
5 /io 15 20\25 30 3 6 9 \12 15 18 
ye Se547 «= 299.307 X = 10.645 
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SECTION 7.5 


In Exercises 15-18, use a X°-test to test the claim about the population variance o 


HYPOTHESIS TESTING FOR VARIANCE AND STANDARD DEVIATION 411 


2 


or standard deviation o at the given level of significance a using the given sample 
statistics. For each claim, assume the population is normally distributed. 


15. 
16. 
17. 
18. 


Claim: o” = 0.52; a = 0.05. Sample statistics: s* = 0.508, n = 18 
Claim: o? = 8.5; a = 0.05. Sample statistics: s* = 7.45,n = 23 
Claim: 0 = 24.9; a = 0.10. Sample statistics: s = 29.1,n = 51 
Claim: 0 < 40; a = 0.01. Sample statistics: s = 40.8, n = 12 


M@ USING AND INTERPRETING CONCEPTS 


Testing Claims Jn Exercises 19-28, (a) write the claim mathematically and 
identify Hy and H,, (b) find the critical value(s) and identify the rejection 
region(s), (c) find the standardized test statistic y*, (d) decide whether to reject or 
fail to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. For each claim, assume the population is normally distributed. 


19. 


20. 


21. 


22. 


23. 


24. 


Carbohydrates A snack food manufacturer estimates that the variance of 
the number of grams of carbohydrates in servings of its tortilla chips is 1.25. 
A dietician is asked to test this claim and finds that a random sample of 
22 servings has a variance of 1.35. At a = 0.05, is there enough evidence to 
reject the manufacturer’s claim? 


Hybrid Vehicle Gas Mileage An auto manufacturer believes that the 
variance of the gas mileages of its hybrid vehicles is 1.0. You work for an 
energy conservation agency and want to test this claim. You find that a 
random sample of the gas mileages of 25 of the manufacturer’s hybrid 
vehicles has a variance of 1.65. At a = 0.05, do you have enough evidence to 
reject the manufacturer’s claim? (Adapted from Green Hybrid) 


Science Assessment Tests On a science assessment test, the scores of a 
random sample of 22 eighth grade students have a standard deviation of 
33.4 points. This result prompts a test administrator to claim that the 
standard deviation for eighth graders on the examination is less than 36 
points. At a = 0.10, is there enough evidence to support the administrator’s 
claim? (Adapted from National Center for Educational Statistics) 


U.S. History Assessment Tests A state school administrator says that the 
standard deviation of test scores for eighth grade students who took a 
US. history assessment test is less than 30 points. You work for the 
administrator and are asked to test this claim. You randomly select 18 tests 
and find that the tests have a standard deviation of 33.6 points. At a = 0.01, 
is there enough evidence to support the administrator’s claim? (Adapted from 
National Center for Educational Statistics) 


Tornadoes A weather service claims that the standard deviation of the 
number of fatalities per year from tornadoes is no more than 25. A random 
sample of the number of deaths for 28 years has a standard deviation of 
31 fatalities. At a = 0.10, is there enough evidence to reject the weather 
service’s claim? (Source: NOAA Weather Partners) 


Lengths of Stay A doctor says the standard deviation of the lengths of stay 
for patients involved in a crash in which the vehicle struck a tree is 6.14 days. 
A random sample of 20 lengths of stay for patients involved in this type of 
crash has a standard deviation of 6.5 days. At a = 0.05, can you reject the 
doctor’s claim? (Adapted from National Highway Traffic Safety Administration) 
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TI-83/84 PLUS 


yv® cdf (0, 43.2, 40) 
6637768667 


25. Total Charges An insurance agent says the standard deviation of the total 
hospital charges for patients involved in a crash in which the vehicle struck 
a construction barricade is less than $3500. A random sample of 28 total 
hospital charges for patients involved in this type of crash has a standard 
deviation of $4100. At a = 0.10, can you support the agent’s claim? (Adapted 
from National Highway Traffic Safety Administration) 


26. Hotel Room Rates A travel agency estimates that the standard deviation 
of the room rates of hotels in a certain city is no more than $30. You work for 
a consumer advocacy group and are asked to test this claim. You find that a 
random sample of 21 hotels has a standard deviation of $35.25. At a = 0.01, 
do you have enough evidence to reject the agency’s claim? 


27. Salaries The annual salaries (in dollars) of 18 randomly chosen 
environmental engineers are listed. At a = 0.05, can you conclude that the 
standard deviation of the annual salaries is greater than $6100? (Adapted 
from Salary.com) 


63,125 59,749 52,369 55,979 61,550 54,644 50,420 
47,291 51,357 56,901 53,499 49,998 69,712 64,575 
45,850 46,297 63,770 71,589 


28. Salaries A staffing organization states that the standard deviation of 
the annual salaries of commodity buyers is at least $10,600. The annual 
salaries (in dollars) of 20 randomly chosen commodity buyers are listed. At 
a = 0.10, can you reject the organization’s claim? (Adapted from Salary.com) 


79,319 68,825 65,129 75,899 85,070 76,270 68,750 
70,982 69,237 63,470 79,025 55,880 80,985 75,264 
66,918 65,459 70,598 86,579 71,225 57,311 


In Exercises 29-32, use StatCrunch to help you test the claim about the 
population variance o* or standard deviation o at the given level of significance a 
using the given sample statistics. For each claim, assume the population is normally 
distributed. 


29. Claim: o? = 9; a = 0.01. Sample statistics: s* = 2.03,n = 10 

30. Claim: 0” = 14.85; a = 0.05. Sample statistics: s* = 28.75, n = 17 
31. Claim: 0 > 4.5; a = 0.05. Sample statistics: s = 5.8, = 15 

32. Claim: o # 418; a = 0.10. Sample statistics: s = 305,n = 24 


M@ EXTENDING CONCEPTS 


P-Values You can calculate the P-value for a x’-test using technology. After 
calculating the x?-test value, you can use the cumulative density function (CDF) to 
calculate the area under the curve. From Example 4 on page 407, x? = 43.2. Using 
a TI-83/84 Plus (choose 7 from the DISTR menu), enter 0 for the lower bound, 
43.2 for the upper bound, and 40 for the degrees of freedom, as shown at the left. 


The P-value is approximately 1 — 0.6638 = 0.3362. Because P > a = 0.05, the 
conclusion is to fail to reject Ho. 


In Exercises 33-36, use the P-value method to perform the hypothesis test for the 
indicated exercise. 


33. Exercise 25 34. Exercise 26 
35. Exercise 27 36. Exercise 28 
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USES AND ABUSES 


Uses 


Do You Favor the Use of 
Full-Body Scanners at 
Airports in the U.S.? 


Hypothesis Testing NHypothesis testing is important in many different fields 
because it gives a scientific procedure for assessing the validity of a claim about 
a population. Some of the concepts in hypothesis testing are intuitive, but some 
are not. For instance, the American Journal of Clinical Nutrition suggests that 
eating dark chocolate can help prevent heart disease. A random sample of 
healthy volunteers were assigned to eat 3.5 ounces of dark chocolate each day 
for 15 days. After 15 days, the mean systolic blood pressure of the volunteers 
was 6.4 millimeters of mercury lower. A hypothesis test could show if this drop 
in systolic blood pressure is significant or simply due to sampling error. 

Careful inferences must be made concerning the results. In another part 
of the study, it was found that white chocolate did not result in similar 
benefits. So, the inference of health benefits cannot be extended to all types 
of chocolate. You also would not infer that you should eat large quantities of 
chocolate because the benefits must be weighed against known risks, such as 
weight gain, acne, and acid reflux. 


Abuses 


Not Using a Random Sample The entire theory of hypothesis testing is based 
on the fact that the sample is randomly selected. If the sample is not random, 
then you cannot use it to infer anything about a population parameter. 


Attempting to Prove the Null Hypothesis If the P-value for a hypothesis 
test is greater than the level of significance, you have not proven the null 
hypothesis is true—only that there is not enough evidence to reject it. For 
instance, with a P-value higher than the level of significance, a researcher 
could not prove that there is no benefit to eating dark chocolate—only that 
there is not enough evidence to support the claim that there is a benefit. 


Making Type I or Type II Errors Remember that a type I error is rejecting 
a null hypothesis that is true and a type IJ error is failing to reject a null 
hypothesis that is false. You can decrease the probability of a type I error by 
lowering the level of significance. Generally, if you decrease the probability of 
making a type I error, you increase the probability of making a type II error. 
‘You can decrease the chance of making both types of errors by increasing the 
sample size. 


M@ EXERCISES 


In Exercises 1-4, assume that you work in a transportation department. You are 
asked to write a report about the claim that 73% of U.S. adults who fly at least 
once a year favor full-body scanners at airports. (Adapted from Rasmussen Reports) 


1. Not Using a Random Sample How could you choose a random sample to 
test this hypothesis? 


2. Attempting to Prove the Null Hypothesis What is the null hypothesis in 
this situation? Describe how your report could be incorrect by trying to 
prove the null hypothesis. 


3. Making aTypeI Error Describe how your report could make a type I error. 


4. Making a Type II Error Describe how your report could make a type II 
error. 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


i> A SUMMARY OF HYPOTHESIS TESTING 


INSIGHT 


Large sample sizes will 
usually increase the cost 
and effort of testing 

a hypothesis, but they 
also tend to make your 
decision more reliable. 


With hypothesis testing, perhaps more than any other area of statistics, it can be 
difficult to see the forest for all the trees. To help you see the forest—the overall 
picture—a summary of what you studied in this chapter is provided. 


Writing the Hypotheses 
m You are given a claim about a population parameter pw, p, 0”, orc. 
= Rewrite the claim and its complement using =, =, = and >, <, #. 
——— ——— 
Ay Hy 


= Identify the claim. Is it Hp or H,? 


Specifying a Level of Significance 
= Specify a, the maximum acceptable probability of rejecting a valid Hp 
(a type I error). 
Specifying the Sample Size 
= Specify your sample size n. 
Choosing the Test ® Any population © Normally distributed population 
= Mean: Hb describes a hypothesized population mean pw. 
= Use a z-test for any population if n = 30. 
= Use a z-test if the population is normal and o is known for any n. 
= Use a t-test if the population is normal and n < 30, but a is unknown. 
= Proportion: HH describes a hypothesized population proportion p. 
= Use a z-test for any population if np = 5 and nq = 5. 


= Variance or Standard Deviation: H, describes a hypothesized population 
variance a” or standard deviation o. 


= Use a x’+test if the population is normal. 


Sketching the Sampling Distribution 
= Use H, to decide if the test is left-tailed, right-tailed, or two-tailed. 


Finding the Standardized Test Statistic 
= Take a random sample of size n from the population. 
= Compute the test statistic x, p, or s, 


= Find the standardized test statistic z, t,or y7. 


Making a Decision 

Option 1. Decision based on rejection region 

= Use a to find the critical value(s) zo, fo, or x and rejection region(s). 
= Decision Rule: 


Reject Ho if the standardized test statistic is in the rejection region. 
Fail to reject Hp if the standardized test statistic is not in the rejection region. 


Option 2. Decision based on P-value 
= Use the standardized test statistic or a technology tool to find the P-value. 
= Decision Rule: 

Reject Hj if P= a. 

Fail to reject Hj if P > a. 
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STUDY TIP 


If your standardized test 
statistic is z or t, remember 
that these values 
measure standard 
deviations from the 
mean. Values that are 
outside of +3 indicate 
that Hp is very unlikely. 
Values that are outside 
of +5 indicate that Ho 
is almost impossible. 
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z-Test for a Hypothesized Mean p= (Section 7.2) 


Test statistic: x Standardized test statistic: z 
Critical value: z ) (Use Table 4.) Si a Le rae 
Ifn = 30), scan be used in place of o. Z£= arr 

Sampling distribution of sample Population t'  4__ Sample size 


means is a normal distribution. standard deviation 


a 04 
ae as 
% 0 0 % 
Left-Tailed Two-Tailed Right-Tailed 
z-Test for a Hypothesized Proportion p (Section 7.4) 
Test statistic: p Standardized test statistic: z 
Critical value: zy (Use Table 4.) Sample 1  ¥ _ Hypothesized 
; gtd Z i = t 
Sampling distribution of sample pigpernen. R= Le a ae 
proportions is a normal distribution. V pq/n 


q = 1- p——— + t _ Sample size 


t-Test for a Hypothesized Mean pp (Section 7.3) 


Test statistic: x Standardized test statistic: ¢ 
Critical value: t) (Use Table 5.) Sample mean + ¥ — Hypothesized 
: . o 5 eS JUL mean 

Sampling distribution of sample means t= wa 
is approximated by a t-distribution s/n 
with df. =n —1. Sample f t Sample size 


standard deviation 


t t >t 


Ly 0 
Left-Tailed Two-Tailed Right-Tailed 


y?-Test for a Hypothesized Variance a? or Standard Deviation @ (Section 7.5) 


Test statistic: s* Standardized test statistic: 7 
Critical value: x (Use Table 6.) Sample size 1 ¢— Sample 
Sampling distribution is approximated Ps iss oe 
by a chi-square distribution with = 2 
df.=n-—1. re Hypothesized 
variance 
o 
a 
2) 2 2 
ie ‘ ie ie ‘ i 3 
Left-Tailed Two-Tailed Right-Tailed 
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7) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 7.1 
= How to state a null hypothesis and an alternative hypothesis 1 1-6 
= How to identify type I and type II errors 2 7-10 
= How to know whether to use a one-tailed or a two-tailed statistical test 3 7-10 
= How to interpret a decision based on the results of a statistical test 4 7-10 
Section 7.2 
= How to find P-values and use them to test a mean pw 1-3 11, 12 
= Howto use P-values for a z-test 4-6 13, 14, 23-28 
= How to find critical values and rejection regions in a normal distribution 7,8 15-18 
= How to use rejection regions for a z-test 9, 10 19-28 
Section 7.3 
= How to find critical values in a ¢-distribution 1-3 29-32 
= How to use the ¢-test to test a mean pu 4,5 33-40 
= How to use technology to find P-values and use them with a f-test to 6 41, 42 


test a mean jw 


Section 7.4 
= How to use the z-test to test a population proportion p 1,2 43-52 
Section 7.5 
= How to find critical values for a y7-test 1-3 53-56 
= How to use the y~-test to test a variance or a standard deviation 4-6 57-63 
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REVIEW EXERCISES 


M@ SECTION 7.1 


In Exercises 1-6, use the given statement to represent a claim. Write its complement 
and state which is Ho and which is H,. 


1. w S 375 2. w = 82 
3. p < 0.205 4. uw # 150,020 
5. 0 > 1.9 6. p = 0.64 


In Exercises 7-10, do the following. 
(a) State the null and alternative hypotheses, and identify which represents the 
claim. 


(b) Determine when a type I or type II error occurs for a hypothesis test of the 
claim. 


(c) Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed. 
Explain your reasoning. 


(d) Explain how you should interpret a decision that rejects the null hypothesis. 
(e) Explain how you should interpret a decision that fails to reject the null 
hypothesis. 


7. A news outlet reports that the proportion of Americans who support plans 
to order deep cuts in executive compensation at companies that have 
received federal bailout funds is 71%. (Source: ABC News) 


8. An agricultural cooperative guarantees that the mean shelf life of a certain 
type of dried fruit is at least 400 days. 


9. A soup maker says that the standard deviation of the sodium content in one 
serving of a certain soup is no more than 50 milligrams. (Adapted from 
Consumer Reports) 


10. An energy bar maker claims that the mean number of grams of 
carbohydrates in one bar is less than 25. 


M@ SECTION 7.2 


In Exercises 11 and 12, find the P-value for the indicated hypothesis test with the 
given standardized test statistic z. Decide whether to reject H for the given level of 
significance a. 


11. Left-tailed test, z = —0.94, a = 0.05 
12. Two-tailed test, z = 2.57, a = 0.10 


In Exercises 13 and 14, use a P-value to test the claim about the population mean 
p using the given sample statistics. State your decision for a = 0.10, a = 0.05, and 
a = 0.01 levels of significance. If convenient, use technology. 


13. Claim: w = 0.05; Sample statistics: ¥ = 0.057, s = 0.018, n = 32 
14. Claim: w # 230; Sample statistics: ¥ = 216.5, 5 = 17.3,n = 48 


In Exercises 15-18, find the critical value(s) for the indicated z-test and level of 
significance a. Include a graph with your answer. 


15. Left-tailed test,a = 0.02 16. Two-tailed test, a = 0.005 
17. Right-tailed test,a = 0.025 18. Two-tailed test, a = 0.08 
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In Exercises 19-22, state whether each standardized test statistic z allows you to 
reject the null hypothesis. Explain your reasoning. 


19. z = 1.631 
20. z = 1.723 
21. z = —1.464 
22. z = —1.655 


3 -2\-1 0 1/2 3 . 
—2 = 1.645 z= 1.645 


In Exercises 23-26, use a z-test to test the claim about the population mean pat the given 
level of significance a using the given sample statistics. If convenient, use technology. 


23. Claim: w = 45; a = 0.05. Sample statistics: ¥ = 47.2, 5 = 6.7,n = 42 

24. Claim: w # 8.45; a = 0.03. Sample statistics: x = 7.88, s = 1.75, n = 60 

25. Claim: w < 5.500; a = 0.01. Sample statistics: ¥ = 5.497, s = 0.011, n = 36 
26. Claim: w = 7450; a = 0.10. Sample statistics: ¥ = 7495, 5 = 243,n = 57 

In Exercises 27 and 28, test the claim about the population mean w using rejection 


region(s) or a P-value. Interpret your decision in the context of the original claim. 
If convenient, use technology. 


27. The U.S. Department of Agriculture claims that the mean cost of raising a 
child from birth to age 2 by husband-wife families in rural areas is $10,380. A 
random sample of 800 children (age 2) has a mean cost of $10,240 with a 
standard deviation of $1561. At a = 0.01, is there enough evidence to reject 
the claim? (Adapted from U.S. Department of Agriculture Center for Nutrition 
Policy and Promotion) 


28. A tourist agency in Hawaii claims the mean daily cost of meals and lodging 
for a family of 4 traveling in Hawaii is at most $650. You work for a consumer 
protection advocate and want to test this claim. In a random sample of 
45 families of 4 traveling in Hawaii, the mean daily cost of meals and lodging 
is $657 with a standard deviation of $40. At a = 0.05, do you have enough 
evidence to reject the tourist agency’s claim? (Adapted from American 
Automobile Association) 


M@ SECTION 7.3 


In Exercises 29-32, find the critical value(s) for the indicated t-test, level of 
significance a, and sample size n. 


29. Two-tailed test,a = 0.05,n = 20 30. Right-tailed test,a = 0.01,n = 8 
31. Left-tailed test,a = 0.005, n = 15 32. Two-tailed test, a = 0.02, n = 12 
In Exercises 33-38, use a t-test to test the claim about the population mean yp at the 


given level of significance a using the given sample statistics. For each claim, 
assume the population is normally distributed. If convenient, use technology. 


33. Claim: w # 95; a = 0.05. Sample statistics: ¥ = 94.1, 5 = 1.53,n = 12 

34. Claim: uw > 12,700; a = 0.005. Sample statistics: ¥ = 12,855, s = 248, n = 21 
35. Claim: uw = 0; a = 0.10. Sample statistics: ¥ = —0.45, 5 = 1.38,n = 16 

36. Claim: w = 4.20; a = 0.02. Sample statistics: ¥ = 4.61, 5 = 0.33,n = 9 

37. Claim: w = 48; a = 0.01. Sample statistics: ¥ = 52,5 = 2.5,n = 7 

38. Claim: w < 850; a = 0.025. Sample statistics: ¥ = 875, 5 = 25,n = 14 
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In Exercises 39 and 40, use a t-test to test the claim. Interpret your decision in the 
context of the original claim. For each claim, assume the population is normally 
distributed. If convenient, use technology. 


39. A fitness magazine advertises that the mean monthly cost of joining 
a health club is $25. You work for a consumer advocacy group and are 
asked to test this claim. You find that a random sample of 18 clubs has 
a mean monthly cost of $26.25 and a standard deviation of $3.23. 
At a = 0.10, do you have enough evidence to reject the advertisement’s 
claim? 


40. A fitness magazine claims that the mean cost of a yoga session is no more 
than $14. You work for a consumer advocacy group and are asked to test this 
claim. You find that a random sample of 29 yoga sessions has a mean cost of 
$15.59 and a standard deviation of $2.60. At a = 0.025, do you have enough 
evidence to reject the magazine’s claim? 


In Exercises 41 and 42, use a t-statistic and its P-value to test the claim about the 
population mean wp using the given data. Interpret your decision in the context of 
the original claim. For each claim, assume the population is normally distributed. 
If convenient, use technology. 


41. An education publication claims that the mean expenditure per student in 
public elementary and secondary schools is at least $10,200. You 
want to test this claim. You randomly select 16 school districts and find the 
average expenditure per student. The results are listed below. At a = 0.01, 
can you reject the publication’s claim? (Adapted from National Center for 
Education Statistics) 


9,242 10,857 10,377 8,935 9,545 9,974 
9,847 10,641 9,364 10,157 9,784 9,962 
10,065 9,851 9,763 9,969 


42. A restaurant association says the typical household in the United States 
spends a mean amount of $2698 per year on food away from home. You are 
a consumer reporter for a national publication and want to test this claim. 
A random sample of 28 U.S. households has a mean amount spent on food 
away from home of $2764 and a standard deviation of $322. At a = 0.05, do 
you have enough evidence to reject the association’s claim? (Adapted from 
U.S. Bureau of Labor Statistics) 


M@ SECTION 7.4 


In Exercises 43-50, decide whether the normal sampling distribution can be used 
to approximate the binomial distribution. If it can, use the z-test to test the claim 
about the population proportion p at the given level of significance a using the 
given sample statistics. If convenient, use technology. 


43. Claim: p = 0.15; a = 0.05. Sample statistics: p = 0.09, n = 40 
44, Claim: p < 0.70; a = 0.01. Sample statistics: p = 0.50, n = 68 
45. Claim: p < 0.09; a = 0.08. Sample statistics: p = 0.07,n = 75 
p = 0.76, n = 116 
47. Claim: p = 0.04; a = 0.10. Sample statistics: p = 0.03, n = 30 
48. Claim: p # 0.34; a = 0.01. Sample statistics: p = 0.29, n = 60 
49, Claim: p # 0.24; a = 0.02. Sample statistics: p = 0.32,n = 50 
50. Claim: p < 0.80; a = 0.10. Sample statistics: p = 0.85,n = 43 


46. Claim: p = 0.65; a = 0.03. Sample statistics: 
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In Exercises 51 and 52, test the claim about the population proportion p. Interpret 
your decision in the context of the original claim. If convenient, use technology. 


51. A polling agency reports that over 16% of U.S. adults are without health care 
coverage. In a random survey of 1420 US. adults, 256 said they did not have 
health care coverage. At a = 0.02, is there enough evidence to support the 
agency’s claim? (Source: The Gallup Poll) 


52. The Western blot assay is a blood test for the presence of HIV. It has 
been found that this test sometimes gives false positive results for HIV. A 
medical researcher claims that the rate of false positives is 2%. A recent 
study of 300 randomly selected U.S. blood donors who do not have HIV 
found that 3 received a false positive test result. At a = 0.05, is there enough 
evidence to reject the researcher’s claim? (Adapted from Centers for Disease 
Control and Prevention) 


M SECTION 7.5 


In Exercises 53-56, find the critical value(s) for the indicated y?-test for a 
population variance, sample size n, and level of significance a. 


53. Right-tailed test, n = 20, a = 0.05 

54. Two-tailed test, n = 14, a = 0.01 

55. Right-tailed test, nm = 51, a = 0.10 

56. Left-tailed test, n = 6, a = 0.05 

In Exercises 57-60, use a x?-test to test the claim about the population variance o” 


or standard deviation o at the given level of significance a and using the given 
sample statistics. For each claim, assume the population is normally distributed. 


57. Claim: 0? > 2;a = 0.10. Sample statistics: s* = 2.95,n = 18 
58. Claim: o* < 60; a = 0.025. Sample statistics: rH 927 215 
59. Claim: 0 = 1.25; a = 0.05. Sample statistics: s = 1.03, = 6 
60. Claim: 0 # 0.035; a = 0.01. Sample statistics: s = 0.026, n = 16 


In Exercises 61 and 62, test the claim about the population variance or standard 
deviation. Interpret your decision in the context of the original claim. For each 
claim, assume the population is normally distributed. 


61. A bolt manufacturer makes a type of bolt to be used in airtight containers. 
The manufacturer needs to be sure that all of its bolts are very similar in 
width, so it sets an upper tolerance limit for the variance of bolt width at 0.01. 
A random sample of the widths of 28 bolts has a variance of 0.064. At 
a = 0.005, is there enough evidence to reject the manufacturer’s claim? 


62. A restaurant claims that the standard deviation of the lengths of serving 
times is 3 minutes. A random sample of 27 serving times has a standard 
deviation of 3.9 minutes. At a = 0.01, is there enough evidence to reject the 
restaurant’s claim? 


63. In Exercise 62, is there enough evidence to reject the restaurant’s claim at the 
a = 0.05 level? Explain. 
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CHAPTER QUIZ 


Take this quiz as you would take a quiz in class. After you are done, check 
your work against the answers given in the back of the book. If convenient, use 
technology. 


For this quiz, do the following. 
(a) Write the claim mathematically. Identify Hg and H,. 


(b) Determine whether the hypothesis test is one-tailed or two-tailed and whether 
to use a Z-test, a t-test, or a ?-test. Explain your reasoning. 


(c) If necessary, find the critical value(s) and identify the rejection region(s). 
(d) Find the appropriate test statistic. If necessary, find the P-value. 
(e) Decide whether to reject or fail to reject the null hypothesis. 


(f) Interpret the decision in the context of the original claim. 


1. A research service estimates that the mean annual consumption of vegetables 
and melons by people in the United States is at least 170 pounds per person. 
A random sample of 360 people in the United States has a mean consumption 
of vegetables and melons of 168.5 pounds per year and a standard deviation 
of 11 pounds. At a = 0.03, is there enough evidence to reject the service’s 
claim that the mean consumption of vegetables and melons by people in the 
United States is at least 170 pounds per person? (Adapted from U.S. Department 
of Agriculture) 


2. A hat company states that the mean hat size for a male is at least 7.25. A 
random sample of 12 hat sizes has a mean of 7.15 and a standard deviation of 
0.27. At a = 0.05, can you reject the company’s claim that the mean hat size 
for a male is at least 7.25? Assume the population is normally distributed. 


3. A maker of microwave ovens advertises that no more than 10% of its 
microwaves need repair during the first 5 years of use. In a random sample of 
57 microwaves that are 5 years old, 13% needed repairs. At a = 0.04, can you 
reject the maker’s claim that no more than 10% of its microwaves need repair 
during the first five years of use? (Adapted from Consumer Reports) 


4. A state school administrator says that the standard deviation of SAT critical 
reading test scores is 112. A random sample of 19 SAT critical reading test 
scores has a standard deviation of 143. At a = 0.10, test the administrator’s 
claim. What can you conclude? Assume the population is normally 
distributed. (Adapted from The College Board) 


5. A government agency reports that the mean amount of earnings for full-time 
workers ages 25 to 34 with a master’s degree is $62,569. In a random sample 
of 15 full-time workers ages 25 to 34 with a master’s degree, the mean amount 
of earnings is $59,231 and the standard deviation is $5945. Is there enough 
evidence to reject the agency’s claim? Use a P-value and a = 0.05. Assume 
the population is normally distributed. (Adapted from U.S. Census Bureau) 


6. A tourist agency in Kansas claims the mean daily cost of meals and lodging 
for a family of 4 traveling in the state is $201. You work for a consumer 
protection advocate and want to test this claim. In a random sample of 
35 families of 4 traveling in Kansas, the mean daily cost of meals and lodging 
is $216 and the standard deviation is $30. Do you have enough evidence to 
reject the agency’s claim? Use a P-value and a@ = 0.05. (Adapted from American 
Automobile Association) 
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“ te - Real Statistics — Real Decisions 


In the 1970s and 1980s, PepsiCo, maker of Pepsi®, began airing television 
commercials in which it claimed more cola drinkers preferred Pepsi® 
over Coca-Cola® in a blind taste test. The Coca-Cola Company, maker 
of Coca-Cola®, was the market leader in soda sales. After the television 
ads began airing, Pepsi® sales increased and began rivaling Coca-Cola® 
sales. 

Assume the claim is that more than 50% of cola drinkers preferred 
Pepsi® over Coca-Cola®. You work for an independent market research 
firm and are asked to test this claim. 


1. How Would You Do It? 


(a) When PepsiCo performed this challenge, PepsiCo representatives 
went to shopping malls to obtain their sample. Do you think this 
type of sampling is representative of the population? Explain. 


(b) What sampling technique would you use to select the sample for 
your study? 


(c) Identify possible flaws or biases in your study. 


2. Testing a Proportion 


In your study, 280 out of 560 cola drinkers prefer Pepsi® over 
Coca-Cola®. Using these results, test the claim that more than 
50% of cola drinkers prefer Pepsi® over Coca-Cola®. Use a = 0.05. 
Interpret your decision in the context of the original claim. Does 
the decision support PepsiCo’s claim? 


3. Labeling Influence 


The Baylor College of Medicine decided to replicate this taste 
test by monitoring brain activity while conducting the test on 
participants. They also wanted to see if brand labeling would affect 
the results. When participants were shown which cola they were 
sampling, Coca-Cola® was preferred by 75% of the participants. 
What conclusions can you draw from this study? 


4. Your Conclusions 
(a) Why do you think PepsiCo used a blind taste test? 


(b) Do you think brand image or taste has more influence on 
consumer preferences for cola? 

(c) What other factors may influence consumer preferences besides 
taste and branding? 


Presented by: https://jafrilibrary.org 


TECHNOLOGY 


TECHNOLOGY 423 


MINITAB TI-83/84 PLUS 


THE CASE OF THE VANISHING 
WOMEN 


53% E> 29% > 9% ED 0% 


From 1966 to 1968, Dr. Benjamin Spock and others 
were tried for conspiracy to violate the Selective 
Service Act by encouraging resistance to the 
Vietnam War. By a series of three selections, no 
women ended up being on the jury. In 1969, Hans 
Zeisel wrote an article in The University of Chicago 
Law Review using statistics and hypothesis 
testing to argue that the jury selection was biased 
against Dr. Spock. Dr. Spock was a well-known 
pediatrician and author of books about raising 
children. Millions of mothers had read his books 
and followed his advice. Zeisel argued that, by 
keeping women off the jury, the court prejudiced 
the verdict. 

The jury selection process for Dr. Spock’s 
trial is shown at the right. 


M@ EXERCISES 


1. The MINITAB display below shows a 
hypothesis test for a claim that the proportion 
of women in the city directory is p = 0.53. In 
the test, n = 350 and p ~ 0.2914. Should you 
reject the claim? What is the level of 
significance? Explain. 


2. In Exercise 1, you rejected the claim that 
p = 0.53. But this claim was true. What type of 
error is this? 


3. If you reject a true claim with a level of 
significance that is virtually zero, what can you 
infer about the randomness of your sampling 
process? 


MINITAB 


Test and Cl for One Proportion 
Test of p = 0.53 vs p not = 0.53 


Sample x N Sample p 
1 102 350 0.291429 


Using the normal approximation. 


Stage 1. The clerk of the Federal District Court 
selected 350 people “at random” from the Boston 
City Directory. The directory contained several 
hundred names, 53% of whom were women. 
However, only 102 of the 350 people selected 
were women. 


Stage 2. The trial judge, Judge Ford, selected 
100 people “at random” from the 350 people. This 
group was called a venire and it contained only 
nine women. 


Stage 3. The court clerk assigned numbers to the 
members of the venire and, one by one, they were 
interrogated by the attorneys for the prosecution 
and defense until 12 members of the jury were 
chosen. At this stage, only one potential female 
juror was questioned, and she was eliminated by 
the prosecutor under his quota of peremptory 
challenges (for which he did not have to give a 
reason). 


4. Describe a hypothesis test for Judge Ford’s 
“random” selection of the venire. Use a claim 
of 

102 
jee OE 

(a) Write the null and alternative hypotheses. 
(b) Use a technology tool to perform the test. 
(c) Make a decision. 
(d) Interpret the decision in the context of 

the original claim. Could Judge Ford’s 

selection of 100 venire members have been 


random? 
99 % Cl Z-Value P-Value 
(0.228862, 0.353995) -8.94 0.000 


Extended solutions are given in the Technology Supplement. 


Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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HYPOTHESIS TESTING WITH ONE SAMPLE 


USING TECHNOLOGY TO PERFORM HYPOTHESIS TESTS 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 
2-Sample t... 
Paired ase 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 


41-Sample t... 


2-Sample t... 
Paired t... 


1 Proportion... 
2 Proportions... 


Display Descriptive Statistics... 
Store Descriptive Statistics... 


Graphical Summary... 


1-Sample Z... 
41-Sample t... 
2-Sample t... 
Paired t... 


2 Proportions... 


Here are some MINITAB and TI-83/84 Plus printouts for some of the 
examples in this chapter. 


MINITAB 


One-Sample Z 


(See Example 5, page 375.) 


Test of mu = 22500 vs not = 22500 
The assumed standard deviation = 3015 


N Mean 


MINITAB 


One-Sample T 
Test of mu = 20500 vs < 


N Mean StDev 
14 19850 1084 


MINITAB 


Test and Cl for One 


SE Mean 
30 21545 jele10) 


95% Cl 


(See Example 4, page 390.) 


20500 
SE Mean 


(See Example 2, page 400.) 


Proportion 


Test of p = 0.25 vs p not = 0.25 


Sample x N 


Sample p 
1 42 200 0.210000 


90% Cl 


Using the normal approximation. 
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(20466, 22624) = {h7/E) 


95% Upper Bound 
290 20363 


(0.162627, 0.257373) 


Pp 
0.083 

T p 
-224 0.021 


Z-Value P-Value 
= {Ls 0.191 
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USING TECHNOLOGY TO PERFORM HYPOTHESIS TESTS 


(See Example 9, page 379.) 


TI-83/84 PLUS 


TI-83/84 PLUS 


(See Example 5, page 391.) 


EDIT CALC fagsars) 


Z-Nest.. 
ee T= Wein. 
3: 2-SampZ Test... 
4: 2-SampT Test... 
5: 1-PropZTest... 
6: 2-PropZTest... 
7\Zinterval... 


T1I-83/84 PLUS 
Z-Test 


Inpt: Data 


Ug: 68000 
a: 5500 
x: 66900 
ie S10) 


Us Up >Uo 


Calculate Draw 


TI-83/84 PLUS 


Z-Test 

u<68000 

z= —1.095445115 
p= 1366608782 
x= 66900 

n= 380 


TI-83/84 PLUS 


EDIT CALC (astcugs) 


lee aestes 
li-lest=. 
3: 2-SampZTest... 
4: 2-SamptT Test... 
5: 1-PropZTest... 
6: 2-PropZTest... 
7\Zinterval... 


TI-83/84 PLUS 


T-Test 


Inpt: Data 


Up: 6.8 
Fe (BL7/ 
Sx 
me “) 


U: Sey = <UQ >Ho 


Calculate Draw 


TI-83/84 PLUS 


T-Test 
yu4 6.8 


t= = (sss OASISIE} 


p= .0860316039 
x=6.7 

Sx= .24 

n= 19 


TI-83/84 PLUS 


t=-1.8162 


(See Example 1, page 399.) 


TI-83/84 PLUS 


EDIT CALC fagsyrs) 


1a Z=-NESb 
en estes 
3: 2-SampZTest... 
4: 2-SampT Test... 
1-PropZTest... 
6: 2-PropZTest... 
7JZinterval... 


TI-83/84 PLUS 


1-PropZTest 
Po: 5 
x€ (Els) 
n: 100 
Prop*Po Keay >Po 
Calculate Draw 


| | 


TI-83/84 PLUS 


1-PropZTest 
prop< .5 
z= -2.2 
p= .0139033989 
p= .39 
n= 100 


TI-83/84 PLUS 
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8.1 


8.2 


8.3 


8.4 


The National Youth Tobacco i 
Survey (NYTS), a report " 
published by the Centers for 
Disease Control and Prevention, 
provides information on the most 
widely used tobacco products 
among U.S. students. One of the 
national health objectives is to 
reduce current cigarette use 
among high school students. 


HYPOTHESIS 
TESTING WITH 
aaa |VWOSAMPLES 


Testing the Difference 
Between Means (Large 
Independent Samples) 


Testing the Difference 
Between Means (Small 
Independent Samples) 
Testing the Difference 
Between Means 
(Dependent Samples) 
Testing the 
Difference Between 
Proportions 
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«€ WHERE YOU'VE BEEN 


In Chapter 6, you were introduced to inferential 
statistics and you learned how to form confidence 
intervals to estimate a parameter. Then, in 
Chapter 7, you learned how to test a claim about 
a population parameter, basing your decision on 
sample statistics and their distributions. 


The National Youth Tobacco Survey (NYTS) is 
a study conducted by the Centers for Disease 
Control and Prevention to provide information 
about student use of tobacco products. As part of 
this study in a recent year, a random sample of 
7000 U.S. male high school students was 
surveyed. The following proportions were found. 


Male High School Students (n = 7000) 


Characteristic Frequency Proportion 
Smoke cigarettes (at least one in the last 30 days) 1484 0.212 
Smoke cigars (at least one in the last 30 days) 1162 0.166 
Use smokeless tobacco (at least once in the last 30 days) 770 0.110 


WHERE YOU’RE GOING p> 


In this chapter, you will continue your study of 
inferential statistics and hypothesis testing. Now, 
however, instead of testing a hypothesis about a 
single population, you will learn how to test a 
hypothesis that compares two populations. 


For instance, in the NYTS study a random 
sample of 7489 U.S. female high school students 
was also surveyed. Here are the study’s findings 
for this second group. 


Female High School Students (n = 7489) 


Characteristic Frequency Proportion 
Smoke cigarettes (at least one in the last 30 days) 1378 0.184 
Smoke cigars (at least one in the last 30 days) 539 0.072 
Use smokeless tobacco (at least once in the last 30 days) 112 0.015 


From these two samples, can you conclude that 
there is a difference in the proportion of high 
school students who smoke cigarettes, smoke 
cigars, or use smokeless tobacco among males 
and among females? Or, might the differences in 
the proportions be due to chance? 


In this chapter, you will learn that you can answer 
these questions by testing the hypothesis that the 
two proportions are equal. For the proportions 
of students who use smokeless tobacco, for 
instance, you can conclude that the proportion of 
male high school students is different from the 
proportion of female high school students. 
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WHAT YOU SHOULD LEARN 


» How to decide whether two 
samples are independent 
or dependent 


» An introduction to two-sample 
hypothesis testing for the 
difference between two 
population parameters 


> How to perform a two-sample 
z-test for the difference 
between two means jp, and 
[2 using large independent 


samples 
Independent Samples 
Sample 1 Sample 2 


Dependent Samples 


Sample 1 Sample 2 


INSIGHT 


Dependent samples 
often involve identical 
twins, before and after 
results for the same 
person or object, or 
results of individuals 
matched for specific 
characteristics. 


Testing the Difference Between Means 
(Large Independent Samples) 


Independent and Dependent Samples » An Overview of Two-Sample 
Hypothesis Testing » Two-Sample z-Test for the Difference Between Means 


> INDEPENDENT AND DEPENDENT SAMPLES 


In Chapter 7, you studied methods for testing a claim about the value of a 
population parameter. In this chapter, you will learn how to test a claim 
comparing parameters from two populations. When you compare the means of 
two different populations, the method you use to sample as well as the sample 
sizes will determine the type of test you will use. 


DEFINITION 


Two samples are independent if the sample selected from one population is 
not related to the sample selected from the second population. Two samples 
are dependent if each member of one sample corresponds to a member of the 
other sample. Dependent samples are also called paired samples or matched 
samples. 


EXAMPLE 1 


> Independent and Dependent Samples 


Classify each pair of samples as independent or dependent and justify 
your answer. 


1. Sample 1: Weights of 65 college students before their freshman year begins 
Sample 2: Weights of the same 65 college students after their freshman year 


2. Sample 1: Scores for 38 adult males on a psychological screening test for 
attention-deficit hyperactivity disorder 

Sample 2: Scores for 50 adult females on a psychological screening test for 
attention-deficit hyperactivity disorder 


> Solution 


1. These samples are dependent. Because the weights of the same students are 
taken, the samples are related. The samples can be paired with respect to 
each student. 


2. These samples are independent. It is not possible to form a pairing between 
the members of samples, the sample sizes are different, and the data 
represent scores for different individuals. 


> Try It Yourself 1 
Classify each pair of samples as independent or dependent. 


1. Sample 1: Systolic blood pressures of 30 adult females 
Sample 2: Systolic blood pressures of 30 adult males 

2. Sample 1: Midterm exam scores of 14 chemistry students 
Sample 2: Final exam scores of the same 14 chemistry students 


a. Determine whether the samples are independent or dependent. 
b. Explain your reasoning. Answer: Page A43 
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SECTION 8.1 


INSIGHT 


The members in the 
two samples are not 
matched or paired, 
so the samples are 
independent. 
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>» AN OVERVIEW OF TWO-SAMPLE HYPOTHESIS 
TESTING 


In this section, you will learn how to test a claim comparing the means of two 
different populations using independent samples. 

For instance, suppose you are developing a marketing plan for an Internet 
service provider and want to determine whether there is a difference in the 
amounts of time male and female college students spend online each day. The 
only way you can conclude with certainty that there is a difference is to take a 
census of all college students, calculate the mean daily times male students and 
female students spend online, and find the difference. Of course, it is not 
practical to take such a census. However, you can still determine with some 
degree of certainty whether such a difference exists. 

You can begin by assuming that there is no difference in the mean times 
of the two populations. That is, w~; — «2 = 0. Then, by taking a random sample 
from each population, you can perform a two-sample hypothesis test using the 
test statistic ¥, — X>. Suppose you obtain the following results. 


Population of Male 
College Students 


x, = 85 min 
s,=15 min 
n, = 200 


Population of Female 
College Students 


X= 81 min 
sy = 17 min 
ny = 250 


The graph below shows the sampling distribution of x, — xX, for many similar 
samples taken from two populations for which , — . = 0. From the graph, you 
can see that it is quite unlikely to obtain sample means that differ by 4 minutes if 
the actual difference is 0. The difference of the sample means would be more than 
2.5 standard errors from the hypothesized difference of 0! So, you can conclude 
that there is a significant difference in the amounts of time male college students 
and female college students spend online each day. 


Sampling Distribution 


Test statistic: x, —x,=85—81=4 


i T t t T T t 
-5 -4 -3 -—2 -1 0 1 2 3 4 =5 


Difference in sample means (in minutes) Standardized test statistic 
<—|} t t t {-—8-+|—-z 
-3 —2 -1 0 1 2 3 
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CHAPTER 8 


STUDY TIP 


You can also write the null and 
alternative hypotheses as follows. 


Ho: 
ee 
Ho: 
ia 


Sampling Distribution 
for x 17 X 


jy = ia = O 
4 — M2 #0 
#4 — Hr = 0 
M1 — Haz > 0 
3 ti = fm = C 
So ee on) 
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HYPOTHESIS TESTING WITH TWO SAMPLES 


It is important to remember that when you perform a two-sample 
hypothesis test using independent samples, you are testing a claim concerning 
the difference between the parameters in two populations, not the values of the 
parameters themselves. 


DEFINITION 


For a two-sample hypothesis test with independent samples, 


1. the null hypothesis Ho is a statistical hypothesis that usually states there 
is no difference between the parameters of two populations. The null 
hypothesis always contains the symbol =, =, or =. 

2. the alternative hypothesis H,, is a statistical hypothesis that is true when Hp 
is false. The alternative hypothesis contains the symbol >, #, or <. 


To write the null and alternative hypotheses for a two-sample hypothesis 
test with independent samples, translate the claim made about the population 
parameters from a verbal statement to a mathematical statement. Then, write its 
complementary statement. For instance, if the claim is about two population 
parameters yp, and pw, then some possible pairs of null and alternative 
hypotheses are 


Ho: by = M2 Hq: by = bo aud Hq: by = be 
Hy: ja F pe Ay: pa > be’ Ag: by < pa 


Regardless of which hypotheses you use, you always assume there is no 
difference between the population means, or pf, = M2. 


>» TWO-SAMPLE z-TEST FOR THE DIFFERENCE 
BETWEEN MEANS 


In the remainder of this section, you will learn how to perform a z-test for the 
difference between two population means pm, and py. Three conditions are 
necessary to perform such a test. 


1. The samples must be randomly selected. 
2. The samples must be independent. 


3. Each sample size must be at least 30 or, if not, each population must have a 
normal distribution with a known standard deviation. 


If these requirements are met, then the sampling distribution for x, — X,, the 
difference of the sample means, is a normal distribution with mean and standard 
error as follows. 


The mean of the difference of the sample Mean = 4 z,-% 
means is the assumed difference between 


the two population means. When no ay ne 
difference is assumed, the mean is 0. — Bi ~ Ba 
The variance of the sampling distribution Standard error = o},-x, 
is the sum of the variances of the individual We ae 
sampling distributions for x, and x2. The as eae 
standard error is the square root of the = 2 

: 32 
sum of the variances. Seq Pe 

m Ny 
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There are about 112,900 public 


elementary and secondary school 


teachers in Georgia and about 
119,300 in Ohio. In a survey, 200 


public elementary and secondary 


school teachers in each state 


were asked to report their salary. 


The results were as follows. 
(Adapted from National Education Association) 


xX, = $49,900 
sy = $6935 


X2 = $51,900 
52 = $6584 


Is there enough evidence to 
conclude that there is a 
difference in the mean salaries 
of public elementary and 
secondary school teachers 

in Georgia and Ohio using 

a = 0.01? 
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Because the sampling distribution for x, — X is a normal distribution, you 
can use the z-test to test the difference between two population means p, and 
Ly. Notice that the standardized test statistic takes the form of 


(Observed difference) — (Hypothesized difference) 
- Standard error : 


TWO-SAMPLE z-TEST FOR THE DIFFERENCE 
BETWEEN MEANS 


A two-sample z-test can be used to test the difference between two 
population means pz, and 2 when a large sample (at least 30) is randomly 
selected from each population and the samples are independent. The test 
statistic is ¥; — X,, and the standardized test statistic is 


oj, o 
Wheres 2 = = +p 
m Ny 


When the samples are large, you can use s, and s, in place of o; and o>. If 
the samples are not large, you can still use a two-sample z-test, provided the 
populations are normally distributed and the population standard deviations 
are known. 


— (1 = 2) — (oa — Ma) 


OTE 


If the null hypothesis states w, = w., My S Mo, OT my = Mo, then pw, = py IS 
assumed and the expression 4; — 2 is equal to 0 in the preceding test. 


GUIDELINES 


Using a Two-Sample z-Test for the Difference Between Means 
(Large Independent Samples) 


IN WORDS 
1. State the claim mathematically 


and verbally. Identify the null 
and alternative hypotheses. 


IN SYMBOLS 
State Ho and H,. 


2. Specify the level of significance. Identify a. 


3. Determine the critical value(s). Use Table 4 in Appendix B. 
4. Determine the rejection region(s). 


(1 — Xo) — (p1 — fH) 


5. Find the standardized test statistic BS 
and sketch the sampling distribution. On-% 

6. Make a decision to reject or fail to 
reject the null hypothesis. 


If zis in the rejection region, 
reject Hy). Otherwise, fail to 
reject Ho. 
7. Interpret the decision in the context 

of the original claim. 


A hypothesis test for the difference between means can also be performed 
using P-values. Use the guidelines above, skipping Steps 3 and 4. After finding 
the standardized test statistic, use the Standard Normal Table to calculate the 
P-value. Then make a decision to reject or fail to reject the null hypothesis. If P 
is less than or equal to a, reject Hy. Otherwise, fail to reject Ho. 
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Sample Statistics for Credit 
Card Debt 


¥, = $4446.25 | ¥, = $4567.24 
5, = $1045.70 | s) = $1361.95 
n= 250 Ny = 250 


STUDY TIP 


In Example 2, you can also use a 
P-value to perform the hypothesis 
test. For instance, the test is a 
two-tailed test, so the P-value 

is equal to twice the area to the 
left of z = —1.11, or 


2(0.1335) = 0.267. 
Because 0.267 > 0.05, 7 
you should fail to ; 
reject Ho. 


See TI-83/84 Plus 
EXAMPLE 2 steps on page 479. 


>» A Two-Sample z-Test for the Difference Between Means 


A credit card watchdog group claims that there is a difference in the mean 
credit card debts of households in New York and Texas. The results of a 
random survey of 250 households from each state are shown at the left. The 
two samples are independent. Do the results support the group’s claim? Use 
a = 0.05. (Adapted from PlasticRewards.com) 


> Solution 
The claim is “there is a difference in the mean credit card debts of households 
in New York and Texas.” So, the null and alternative hypotheses are 

Ao: py = bo and Ag: by # py. (Claim) 


Because the test is a two-tailed test and the level of significance is a = 0.05, 
the critical values are —zy = —1.96 and Zz, = 1.96. The rejection regions are 
z < —1.96 and z > 1.96. Because both samples are large, s, and s, can be 
used in place of a; and a; to calculate the standard error. 


2 2 
ST 59 
Ox,-x, © m se m 
1045.702 1361.95? 
= + i) . 
F aa or 108.5983 


The standardized test statistic is 


7 (x1 — X2) — (Hi — M2) 


O- 


Use the z-test. 


X1—X2 


(4446.25 — 4567.24) — 0 
. 108.5983 Assume a1 = 2, SO [1 — Ma = 0. 


x —1.11. 


The graph at the left shows the location of the rejection regions and the 
standardized test statistic z. Because z is not in the rejection region, you should 
fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the group’s claim that there is a difference in the mean credit card 
debts of households in New York and Texas. 


> Try It Yourself 2 


A survey indicates that the mean annual wages for forensic science technicians 
working for local and state governments are $53,300 and $51,910, respectively. 
The survey includes a randomly selected sample of size 100 from each 
government branch. The sample standard deviations are $6200 (local) and 
$5575 (state). The two samples are independent. At a = 0.10, is there enough 
evidence to conclude that there is a difference in the mean annual wages? 
(Adapted from U.S. Bureau of Labor Statistics) 


. Identify the claim and state Hy) and H,. 

. Identify the level of significance a. 

. Find the critical values and identify the rejection regions. 

. Use the z-test to find the standardized test statistic z. Sketch a graph. 

. Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A43 


oman & & 
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Sample Statistics for Daily 
Cost of Meals and Lodging 
for Two Adults 


= $18 52 = $24 
n= 50 ny = 35 


STUDY TIP 

Note that the TI-83/84 Plus 
displays P ~ 0.1051. 
Because P > a, you 

should fail to reject 

the null hypothesis. 


Sample Statistics for Daily 
Cost of Meals and Lodging 
for Two Adults 


5 = $22 | = $18 
mn, = 150 | m = 200 
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EXAMPLE 3 G® Report 33 


» Using Technology to Perform a Two-Sample z-Test 


A travel agency claims that the average daily cost of meals and lodging for 
vacationing in Texas is less than the same average cost for vacationing in 
Virginia. The table at the left shows the results of a random survey of 
vacationers in each state. The two samples are independent. At a = 0.01, is 
there enough evidence to support the claim? [Ho: w; = m2 and H,: wy < po 
(claim)] (Adapted from American Automobile Association) 


> Solution 


The top two displays show how to set up the hypothesis test using a TI-83/84 
Plus. The remaining displays show the possible results, depending on whether 


you select Calculate or Draw. 
TI-83/84 PLUS TI-83/84 PLUS 


2-SampZTest 2-SampZTest 

Inpt: Data 7 02:24 

01:18 ELIS 

02:24 im lets{)} 

x1:216 x2:222 

n1:50 n2i3s 

x2:2202 u1:4#u2 >y2 
bn2:35 Calculate Draw 


TI-83/84 PLUS 


TI-83/84 PLUS 


2-SampZTest 
u1<ye2 
z= -1.252799556 
p= .1051393971 
x1= 216 
x2= 222 

1n1:50 


Because the test is a left-tailed test and a = 0.01, the rejection region is 
Zz < —2.33. The standardized test statistic z ~ —1.25 is not in the rejection 
region, so you should fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 1% level of significance 
to support the travel agency’s claim. 


> Try It Yourself 3 


A travel agency claims that the average daily cost of meals and lodging for 
vacationing in Alaska is greater than the same average cost for vacationing 
in Colorado. The table at the left shows the results of a random survey of 
vacationers in each state. The two samples are independent. At a = 0.05, is 
there enough evidence to support the claim? (Adapted from American Automobile 
Association) 


a. Use a TI-83/84 Plus to find the fest statistic or the P-value. 
b. Decide whether to reject the null hypothesis. 
c. Interpret the decision in the context of the original claim. 
Answer: Page A43 
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ESD EXERCISES 


FOR EXTRA HELP; 


“a 


HM BUILDING BASIC SKILLS AND VOCABULARY 


. What is the difference between two samples that are dependent and two 


samples that are independent? Give an example of each. 


. Explain how to perform a two-sample z-test for the difference between the 


means of two populations using large independent samples. 


. Describe another way you can perform a hypothesis test for the difference 


between the means of two populations using large independent samples. 


. What conditions are necessary in order to use the z-test to test the difference 


between two population means? 


In Exercises 5-12, classify the two given samples as independent or dependent. 
Explain your reasoning. 


5. 


10. 


11. 


Sample 1: The SAT scores of 35 high school students who did not take an 
SAT preparation course 

Sample 2: The SAT scores of 40 high school students who did take an SAT 
preparation course 


. Sample 1: The SAT scores of 44 high school students 


Sample 2: The SAT scores of the same 44 high school students after taking 
an SAT preparation course 


. Sample 1: The maximum bench press weights for 53 football players 


Sample 2: The maximum bench press weights for the same football players 
after completing a weight lifting program 


. Sample 1: The IQ scores of 60 females 


Sample 2: The IQ scores of 60 males 


. Sample 1: The average speed of 23 powerboats using an old hull design 


Sample 2: The average speed of 14 powerboats using a new hull design 


Sample 1: The commute times of 10 workers who use their own vehicles 


Sample 2: The commute times of the same 10 workers when they use public 
transportation 


The table shows the braking distances (in feet) for each of four different sets 
of tires with the car’s antilock braking system (ABS) on and with ABS off. 
The tests were done on ice with cars traveling at 15 miles per hour. (Source: 
Consumer Reports) 


2 3 4 
55 4 «61 
67 59 75 


. The table shows the heart rates (in beats per minute) of five people before 


and after exercising. 
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In Exercises 13-16, (a) find the test statistic, (b) find the standardized test statistic, 
(c) decide whether the standardized test statistic is in the rejection region, and 
(d) decide whether you should reject or fail to reject the null hypothesis. The 
samples are random and independent. 


13. Claim: ,; = 2; a = 0.01. Sample statistics: ¥, = 16, s; = 3.4, n,; = 30 and 
X> = 14, io LD, Ny = 30 


3/-2 -1 0 1 2/3 — 3-2-1 0 1\2 3 
- =-2515 Lo sns ae 1.28 
FIGURE FOR EXERCISE 13 FIGURE FOR EXERCISE 14 


14. Claim: p,; > p.; a = 0.10. Sample statistics: ¥; = 500, s; = 40, n, = 100 
and X> = 495, so = 15, hy >= 75 


15. Claim: wy; < p.; a = 0.05. Sample statistics: ¥, = 2435, 5, = 75, n, = 35 
and xX, = 2432, s, = 105, nm. = 90 


Zz 
Zz 1 1 | 1 


a2! 6 t 2 3 3-2-1 0 1 #2 3 
2 =— 1.645 Zp = 1.88 
FIGURE FOR EXERCISE 15 FIGURE FOR EXERCISE 16 


16. Claim: uw, = 2; a = 0.03. Sample statistics: ¥; = 5004, 5; = 136, n, = 144 
and X =. 4895, So = 215, ny = 156 


In Exercises 17 and 18, use the given sample statistics to test the claim about the 
difference between two population means ju; and pz at the given level of 
significance a. 


17. Claim: py, > 2; a = 0.01. Sample statistics: ¥, = 5.2, 5; = 0.2, ny = 45 and 
X> = 5; So = 0.3, Ny = 37 


18. Claim: uw, # 2; a = 0.05. Sample statistics: ¥,; = 52, s; = 2.5, ny = 70 and 
X> = 45, 59 = 5.5, ny = 60 


In Exercises 19 and 20, use the TI-83/84 Plus display to make a decision at the 
given level of significance. Make your decision using the standardized test statistic 
and using the P-value. Assume the sample sizes are equal. 


19. a = 0.05 20. a = 0.01 
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M@ USING AND INTERPRETING CONCEPTS 


Testing the Difference Between Two Means In Exercises 21-34, 
(a) identify the claim and state Hj and H,, (b) find the critical value(s) and 
identify the rejection region(s), (c) find the standardized test statistic z, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. If convenient, use technology to solve the 
problem. In each exercise, assume the samples are randomly selected and 
independent. 


21. Braking Distances To compare the braking distances for two types of tires, 
a safety engineer conducts 35 braking tests for each type. The results of the 
tests are shown in the figure. At a = 0.10, can the engineer support the claim 
that the mean braking distances are different for the two types of tires? 
(Adapted from Consumer Reports) 


Type A Type B Type C Type D 


xX, = 42 feet X= 45 feet X, =55 feet X)=51 feet 
8, =4.7 feet Sy = 4.3 feet 8, =5.3 feet Sy = 4.9 feet 
FIGURE FOR EXERCISE 21 FIGURE FOR EXERCISE 22 


22. Braking Distances To compare the braking distances for two types of tires, 
a safety engineer conducts 50 braking tests for each type. The results of the 
tests are shown in the figure. At a = 0.10, can the engineer support the claim 
that the mean braking distance for Type C is greater than the mean braking 
distance for Type D? (Adapted from Consumer Reports) 


23. Wind Energy An energy company wants to choose between two regions in 
a state to install energy-producing wind turbines. The company will choose 
Region A if its average wind speed is greater than that of Region B. To test 
the regions, the average wind speed is calculated for 60 days in each region. 
The results of the company’s research are shown in the figure. At a = 0.05, 
should the company choose Region A? 


Region A Region B Region C Region D 


xX, = 13.2 mph X, = 12.5 mph x, = 14.0 mph X, = 15.1 mph 
s, =2.3 mph Sy = 2.7 mph 8, =2.9 mph Sy = 3.3 mph 


FIGURE FOR EXERCISE 23 FIGURE FOR EXERCISE 24 


24. Wind Energy An energy company wants to choose between two regions in 
a State to install energy-producing wind turbines. A researcher claims that 
the wind speeds in Region C and Region D are equal. The company tests the 
regions by calculating the average wind speed for 75 days in Region C and 
80 days in Region D. The results of the company’s research are shown in the 
figure. At a = 0.03, can the company reject the researcher’s claim? 
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25. 


s 
4 


ACT Scores The mean ACT score for 43 male high school students is 21.1 
and the standard deviation is 5.0. The mean ACT score for 56 female high 
school students is 20.9 and the standard deviation is 4.7. At a = 0.01, can 
you reject the claim that male and female high school students have equal 
ACT scores? (Adapted from ACT Inc.) 


. ACT Scores A guidance counselor claims that high school students in a 


college preparation program have higher ACT scores than those in a general 
program. The mean ACT score for 49 high school students who are in a 
college preparation program is 22.2 and the standard deviation is 4.8. The 
mean ACT score for 44 high school students who are in a general program is 
20.0 and the standard deviation is 5.4. At a = 0.10, can you support the 
guidance counselor’s claim? (Adapted from ACT Inc.) 


. Home Prices A real estate agency says that the average home sales price in 


Dallas, Texas is the same as in Austin, Texas. The average home sales price for 
35 homes in Dallas is $240,993 and the standard deviation is $25,875. The 
average home sales price for 35 homes in Austin is $249,237 and the standard 
deviation is $27,110. At a = 0.10, is there enough evidence to reject the 
agency’s claim? (Adapted from RealtyTrac Inc.) 


. Money Spent Eating Out A restaurant association says that households 


in the United States headed by people under the age of 25 spend less on 
food away from home than do households headed by people ages 65-74. The 
mean amount spent by 30 households headed by people under the age of 25 
is $1876 and the standard deviation is $113. The mean amount spent 
by 30 households headed by people ages 65-74 is $1878 and the standard 
deviation is $85. At a = 0.05, can you support the restaurant association’s 
claim? (Adapted from Bureau of Labor Statistics) 


. Home Prices Refer to Exercise 27. Two more samples are taken, one from 


Dallas and one from Austin. For 50 homes in Dallas, x, = $247,245 and 
8, = $22,740. For 50 homes in Austin, x, = $239,150 and s, = $20,690. Use 
a = 0.10. Do the new samples lead to a different conclusion? 


. Money Spent Eating Out Refer to Exercise 28. Two more samples are 


taken, one from each age group. For 40 households headed by people under 
the age of 25, x, = $2015 and s, = $124. For 40 households headed by 
people ages 65-74, x, = $2099 and s, = $111. Use a = 0.05. Do the new 
samples lead to a different conclusion? 


31. Watching More TV? A sociologist claims that children ages 6-17 spent 
more time watching television in 1981 than children ages 6-17 do today. 
A study was conducted in 1981 to find the time that children ages 6-17 
spent watching television on weekdays. The results (in hours per 
weekday) are shown below. 


20 25 21 2.3 21 16 26 21 21 24 
21 21 15 1.7 21 23 25 33 2.2 2.9 
15°19 24 22 12 30 10 21 19 2.2 


Recently, a similar study was conducted. The results are shown below. 


29 18 09 16 20 1.7 25 11 16 2.0 
14°17 1.7 1.9 16 1.7 12 2.0 26 1.6 
15° 25 16 21 1.7 18 11 14 12 23 


At a = 0.025, can you support the sociologist’s claim? (Adapted from 
University of Michigan’s Institute for Social Research) 
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s 
4 


32. 


33. 


34. 


Spending More Time Studying? A sociologist thinks that middle 
school boys spent less time studying in 1981 than middle school boys do 
today. A study was conducted in 1981 to find the time that middle school 
boys spent studying on weekdays. The results (in minutes per weekday) 
are shown below. 


31.9 35.4 28.0 39.1 30.5 31.9 33.0 29.6 35.7 30.2 
38.8 35.9 37.1 36.2 32.6 36.9 242 285 28.7 41.1 
33.8 32.1 28.7 354 36.6 343 35.5 342 33.8 25.3 
27.7 21.9 30.0 36.8 26.9 


Recently, a similar study was conducted. The results are shown below. 


44.7 546 41.1 46.7 43.0 46.6 42.9 48.7 50.0 47.9 
47.2 58.0 51.0 41.1 49.6 51.3 39.0 45.6 49.8 54.4 
47.1 45.5 52.8 49.4 47.2 548 40.2 45.4 48.6 50.0 
51.5 55.0 44.7 42.2 52.0 


At a = 0.03, can you support the sociologist’s claim? (Adapted from 
University of Michigan’s Institute for Social Research) 


Washer Diameters A production engineer claims that there is no 
difference in the mean washer diameter manufactured by two different 
methods. The first method produces washers with the following 
diameters (in inches). 


0.861 0.864 0.882 0.887 0.858 0.879 0.887 0.876 0.870 
0.894 0.884 0.882 0.869 0.859 0.887 0.875 0.863 0.887 
0.882 0.862 0.906 0.880 0.877 0.864 0.873 0.860 0.866 
0.869 0.877 0.863 0.875 0.883 0.872 0.879 0.861 


The second method produces washers with these diameters (in inches). 


0.705 0.703 0.715 0.711 0.690 0.720 0.702 0.686 0.704 
0.712 0.718 0.695 0.708 0.695 0.699 0.715 0.691 0.696 
0.680 0.703 0.697 0.694 0.714 0.694 0.672 0.688 0.700 
0.715 0.709 0.698 0.696 0.700 0.706 0.695 0.715 


At a = 0.01, can you reject the production engineer’s claim? 


Nut Diameters A production engineer claims that there is no 
difference in the mean nut diameter manufactured by two different 
methods. The first method produces nuts with the following diameters 
(in centimeters). 


3.330 3.337 3.329 3.354 3.325 3.343 3.333 3.347 3.332 
3.358 3.353 3.335 3.341 3.331 3.327 3.326 3.337 3.336 
3.323 3.347 3.329 3.345 3.329 3.338 3.353 3.339 3.338 
3.338 3.350 3.320 3.364 3.340 3.348 3.339 3.336 3.321 
3.316 3.352 3.320 3.336 


The second method produces nuts with these diameters (in centimeters). 


3.513 3.490 3.498 3.504 3.483 3.512 3.494 3.514 3.495 
3.489 3.493 3.499 3.497 3.495 3.496 3.485 3.506 3.517 
3.484 3.498 3.522 3.505 3.501 3.491 3.500 3.499 3.475 
3.486 3.501 3.496 3.504 3.513 3.511 3.501 3.487 3.508 
3.515 3.505 3.496 3.505 


At a = 0.04, can you reject the production engineer’s claim? 


35. Getting at the Concept Explain why the null hypothesis Ho: uw, = p> is 
equivalent to the null hypothesis Ho: wy — pb. = 0. 
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36. Getting at the Concept Explain why the null hypothesis Ho: uw; = pz is 
equivalent to the null hypothesis Ho: uw; — bw: = 0. 


In Exercises 37-40, use StatCrunch to help you test the claim about the 
difference between two population means jp; and [Ly at the given level of 
significance a using the given sample statistics. Assume the samples are randomly 
selected and independent. 


37. Claim: uw, # pf; a = 0.01. Sample statistics: ¥, = 64, 5; = 5.4, n, = 50 and 
X> = 60, So = TDs Ny = 45 


38. Claim: uw, > 2; a = 0.10. Sample statistics: x; = 158.9, 5; = 20.8, n, = 80 
and X> = 155.3, s. = 24.6, n. = 80 


39. Claim: uw, = p2; a = 0.05. Sample statistics: ¥, = 4.75, s; = 0.92, ny = 35 
and X, = 5.07, sy = 0.73, no = 40 


40. Claim: py = 2; a = 0.03. Sample statistics: X¥, = 1740.28, s, = 193.80, 
n, = 100 and X) = 1695.70, sy = 129.25, n. = 100 


M@ EXTENDING CONCEPTS 


Testing a Difference Other Than Zero Sometimes a researcher is interested 
in testing a difference in means other than zero. For instance, you may want to 
determine if children today spend an average of 9 hours a week more in day care 
(or preschool) than children did 20 years ago. In Exercises 41-44, you will test the 
difference between two means using a null hypothesis of Ho: wy, — po = k, 
Ao: by — Po = k, or Ho: wy — bo = k. The standardized test statistic is still 


(x) — X2) — (M2 — Ba) oj 0 
Z= where 05.44) = 
OX, -X ny nN» 


41. Time in Day Care or Preschool In 1981, a study of 70 randomly selected 
children under 3 years old found that the mean length of time spent in day 
care or preschool per week was 11.5 hours with a standard deviation of 
3.8 hours. A recent study of 65 randomly selected children under 3 years old 
found that the mean length of time spent in day care or preschool per week 
was 20 hours and the standard deviation was 6.7 hours. At a = 0.01, test the 
claim that children spend 9 hours a week more in day care or preschool today 
than they did in 1981. (Adapted from University of Michigan’s Institute for Social 
Research) 


42. Time Watching TV A recent study of 48 randomly selected children ages 


Microbiologists in 6-8 found that the mean length of time spent watching television each week 
Maryland =~ was 12.95 hours and the standard deviation was 4.31 hours. The mean time 56 
X, = $94,980  * | randomly selected children ages 9-11 spent watching television each week 
s, = $8795 rt & was 15.02 hours and the standard deviation was 4.99 hours. At a = 0.05, test 
n, =42 ~~ ae the claim that the mean time per week children ages 6-8 watch television is 
2 hours less than that of children ages 9-11. (Adapted from University of 
Microbiologists in Michigan's Institute for Social Research) 
California 


43. Microbiologist Salaries Is the difference between the mean annual salaries 


X, = $80,830 of microbiologists in Maryland and California more than $10,000? To decide, 
a eee you select a random sample of microbiologists from each state. The results of 
i each survey are shown in the figure. At a = 0.05, what should you conclude? 
FIGURE FOR EXERCISE 43 (Adapted from U.S. Bureau of Labor Statistics) 
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¥, = $88,540 


51 = $8225 
n= 30 


¥) = $72,870 
5. = $7640 
nz = 32 


TABLES FOR EXERCISE 44 


= 12 So = 1.5 
n= 140 | n> = 127 


TABLE FOR EXERCISE 46 
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44. Registered Nurse and Physician Assistant Salaries At a = 0.10, test the 
claim that the difference between the mean salary for physician assistants 
and the mean salary for registered nurses in New Jersey is greater than 
$15,000. The results of a survey of randomly selected physician assistants and 
registered nurses in New Jersey are shown at the left. (Adapted from U.S. 
Bureau of Labor Statistics) 


Constructing Confidence Intervals for ju; — fey You can construct a 
confidence interval for the difference between two population means j4y — [2 by 
using the following if n; = 30 and nz = 30, or if both populations are normally 
distributed and both population standard deviations are known. Also, the samples 
must be randomly selected and independent. 


of. a ee 
+ — < py > pes Or ee) Pee 
ny n2 ny n2 

In Exercises 45 and 46, construct the indicated confidence interval for w, — [2 


45. DASH Diet and Systolic Blood Pressure A study was conducted to see if 
a specific diet and exercise program called the DASH (Dietary Approaches 
to Stop Hypertension) program, which emphasizes the consumption of fruits, 
vegetables, and low-fat dairy products, can reduce systolic blood pressure 
more than a traditional diet and exercise program does. After 6 months, 269 
people using the DASH diet had a mean systolic blood pressure of 123.1 mm 
Hg and a standard deviation of 9.9 mm Hg. After the same time period, 
268 people using a traditional diet and exercise program had a mean systolic 
blood pressure of 125 mm Hg and a standard deviation of 10.1 mm Hg. Before 
the study, each group had the same mean systolic blood pressure. Construct a 
95% confidence interval for w, — f2, where p, is the mean systolic blood 
pressure for the group using the DASH diet and exercise program and py is 
the mean systolic blood pressure for the group using the traditional diet and 
exercise program. (Source: The Journal of the American Medical Association) 


46. Comparing Cancer Drugs In a study, two groups of patients with colorectal 
cancer are treated with different drugs. Group A is treated with the drug 
Irinotecan and Group B is treated with the drug Fluorouracil. The results 
of the study on the number of months in which the groups reported no 
cancer-related pain are shown at the left. Construct a 95% confidence interval 
for 4 — [. (Adapted from The Lancet) 


47. Make a Decision Refer to the study in Exercise 45. At a = 0.05, test the 
claim that the mean systolic blood pressure for the group using the DASH 
diet and exercise program is less than the mean systolic blood pressure for 
the group using the traditional diet and exercise program. Would you 
recommend the DASH diet and exercise program over the traditional diet 
and exercise program? Explain. 


48. Make a Decision Refer to the study in Exercise 46. At a = 0.05, test the 
claim that the mean number of months of cancer-related pain relief obtained 
with Irinotecan is greater than the mean number of months of cancer-related 
pain relief obtained with Fluorouracil. Would you recommend Irinotecan 
over Fluorouracil to relieve cancer-related pain? Explain. 


49. Compare the confidence interval you constructed in Exercise 45 with the 
hypothesis test result in Exercise 47. Explain why you would reject the null 
hypothesis if the confidence interval contains only negative numbers. 


50. Compare the confidence interval you constructed in Exercise 46 with the 
hypothesis test result in Exercise 48. Explain why you would reject the null 
hypothesis if the confidence interval contains only positive numbers. 
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Readability of Patient Education 
Materials 


Many patient education materials (PEMs) published by health organizations are written at above 
average readability levels, which may make it difficult for people to comprehend the information. 
Readability measures the grade level of education necessary to understand written material. 
According to the National Assessment of Adult Literacy, 14% of U.S. adults have below basic health 
literacy. Many studies are performed to determine the readability levels of different PEMs. 

The table below shows the results of three different studies. Study 1 evaluated PEMs 
published by the American Cancer Society. Study 2 evaluated PEMs published by the National 
Clearinghouse for Alcohol and Drug Information. Study 3 evaluated PEMs from the American 
Academy of Family Physicians. 


Study 1 Study 2 
ny = 51 Nz = 52 
Readability level xX, = 11.9 X = 11.84 
(by grade level) sy = 2.2 Sy = 0.94 
Mi EXERCISES 


5. Construct a 95% confidence interval for 


In Exercises I-3, perform a two-sample z-test 
to determine whether the mean readability 
levels of the two indicated studies are different. 
For each exercise, write your conclusion as a 
sentence. Use a = 0.05. 


1. Test the readability levels of PEMs in 


/4 — 2, where p, is the mean readability 
level in Study 1 and yp, is the mean 
readability level in Study 2. Interpret 
the results. (See Extending Concepts in 
Section 8.1 Exercises.) 


Study 1 against those in Study 2. 


. Test the readability levels of PEMs in 


Study 1 against those in Study 3. 


. Test the readability levels of PEMs in 


Study 2 against those in Study 3. 


. In which comparisons in Exercises 1-3 did 


you find a difference in readability levels? 
Write a summary of your findings. 


. In a fourth study conducted by the Johns 


Hopkins Oncology Center, the mean 
readability level of 137 PEMs was 11.1, 
with a standard deviation of 1.67. 


(a) Test the mean readability level of this 
study against the level of Study 1. Use 
a = 0.01. 


(b) Test the mean readability level of this 
study against the level of Study 2. Use 
a = 0.01. 
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WHAT YOU SHOULD LEARN 


>» How to perform a t-test for 
the difference between two 
population means p14; and 2 
using small independent 
samples 


STUDY TIP 


You will need to know whether 
the variances of two 
populations are equal. 

In this chapter, each 
example and exercise 

will state whether the 
variances are equal. 

You will learn to test 

for differences in variance 
of two populations in 
Chapter 10. 


Testing the Difference Between Means (Small 
Independent Samples) 


The Two-Sample t-Test for the Difference Between Means 


>» THE TWO-SAMPLE t-TEST FOR THE DIFFERENCE 
BETWEEN MEANS 


As you have learned, in real life, it is often not practical to collect samples of size 30 
or more from each of two populations. However, if both populations have a normal 
distribution, you can still test the difference between their means. In this section, 
you will learn how to use a test to test the difference between two population 
means 2 and sz using independent samples from each population. The following 
conditions are necessary to use a f-test for small independent samples. 


1. The samples must be randomly selected. 
2. The samples must be independent. 
3. Each population must have a normal distribution. 


When these conditions are met, the sampling distribution for the difference 
between the sample means X,; — X, is approximated by a f-distribution with mean 
[41 — 2. So, you can use a two-sample ¢-test to test the difference between the 
two population means 1; and po. The standard error and the degrees of freedom 
of the sampling distribution depend on whether the population variances oj and 


o3 are equal. 


TWO-SAMPLE t-TEST FOR THE DIFFERENCE 
BETWEEN MEANS 


A two-sample t-test is used to test the difference between two population 
means 4; and w> when a sample is randomly selected from each population. 
Performing this test requires each population to be normally distributed, and 
the samples should be independent. The test statistic is x; — X, and the 
standardized test statistic is 


(%1 — X2) — (M1 — Bz) 


Sx — x 


Variances are equal: If the population variances are equal, then information 
from the two samples is combined to calculate a pooled estimate of the 
standard deviation o. 


A iG = se GS) 
GH 


i IP iy = 2 


The standard error for the sampling distribution of x; — X) is 


a il 1 
Spam = we a ot mE Variances equal 


and df. = n, + m, — 2. 


Variances are not equal: If the population variances are not equal, then the 
standard error is 


S83 
Sz, = ar Variances not equal 
1 2 n n 
1 2 


and df. = smaller of nm, — 1 and ny — 1. 
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-— WORLD 
A study published by the 
American Psychological 
Association in the journal 
Neuropsychology reported that 
children with musical training 
showed better verbal memory 
than children with no musical 
training. The study also showed 
that the longer the musical 
training, the better the verbal 
memory. Suppose you tried to 
duplicate the results as follows. 
A verbal memory test with a 
possible 100 points was 
administered to 90 children. 
Half had musical training, while 
the other half had no training 
and acted as the control group. 
The 45 children with training had 
an average score of 83.12 with a 
standard deviation of 5.7. The 45 
students in the control group had 
an average score of 79.9 with a 
standard deviation of 6.2. 


At a = 0.05, is there enough 
evidence to support the claim 
that children with musical 
training have better verbal 
memory test scores than those 
without training? Assume the 
populations are normally 
distributed and the population 
variances are equal. 
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The requirements for the z-test described in Section 8.1 and the ¢test 
described in this section are shown in the flowchart below. 


Two-Sample Tests for Independent Samples 


Are both sample 


: se the z-test. 
sizes at least 30? u : 


Ye 


y 


Use the t-test with 


Are both populations You cannot use the 
normal? z-test or the f-test. 
Are both population : 
popula Are the population 
standard deviations : 
variances equal? 
known? 


& 


and d.f.=n, +n,— 2. 


Use the t-test with 


| Use the z-test. 


s? s2 
ee 
X, —X, He 


n,—landn,—1. 


and d.f. = smaller of 


(Small Independent Samples) 
IN WORDS 


. State the claim mathematically 
and verbally. Identify the null 
and alternative hypotheses. 


2. Specify the level of significance. 
3. Determine the degrees of freedom. 


4. Determine the critical value(s). 


. Determine the rejection region(s). 


. Find the standardized test statistic 
and sketch the sampling distribution. 


. Make a decision to reject or fail to 
reject the null hypothesis. 


. Interpret the decision in the context 
of the original claim. 
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Using a Two-Sample t-Test for the Difference Between Means 


IN SYMBOLS 
State Ho and H,. 


Identify a. 

d.f. =n, + nm) — 2 or 

d.f. = smaller of nm; — 1 
and n, — 1 

Use Table 5 in Appendix B. 


a (Ca = 2) = (i = fla) 


Sz,-X> 


If tis in the rejection region, 
reject Hy. Otherwise, fail to 
reject Hp. 
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Sample Statistics for State 
Mathematics Test Scores 


5, = 39.7 5 = 245 
n= 8 ny = 18 


—ty =—1.895 7 ~ 0.922 t) = 1.895 


Sample Statistics for 
Annual Earnings 


¥, = $27,136 %> = $34,329 
s, = $2318 Sy = $4962 
ny = 15 Ny = 12 


See MINITAB 
EXAMPLE 1 Report 34 steps on page 478. 


>» A Two-Sample t-Test for the Difference Between Means 


The results of a state mathematics test for random samples of students taught 
by two different teachers at the same school are shown at the left. Can you 
conclude that there is a difference in the mean mathematics test scores for the 
students of the two teachers? Use a = 0.10. Assume the populations are 
normally distributed and the population variances are not equal. 


> Solution 


The claim is “there is a difference in the mean mathematics test scores for the 
students of the two teachers.” So, the null and alternative hypotheses are 


Ao: by = be and Ay: by F 2. (Claim) 


Because the variances are not equal and the smaller sample size is 8, use 
df. = 8 —1=7. Because the test is a two-tailed test with df. = 7 and 
a = 0.10, the critical values are —f) = —1.895 and ty = 1.895. The rejection 
regions are t < —1.895 and t > 1.895. The standard error is 


_ f(s, 3 
SX, —X, + 
ny Ny 


39.72 24.52 
= + mw 15, , 
af ; is 15.1776 


The standardized test statistic is 


(%1 — X2) — (M1 — M2) 


i= ee Use the test. 
(473 — 459) — 0 n , 
~ 15.1776 SSUME JL} = Lp, SO fy — My = 0. 
= 0.922. 


The graph at the left shows the location of the rejection regions and the 
standardized test statistic t. Because fis not in the rejection region, you should 
fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 10% level of significance 
to support the claim that the mean mathematics test scores for the students of 
the two teachers are different. 


> Try It Yourself 1 


The annual earnings of 15 people with a high school diploma and 12 people 
with a bachelor’s degree or higher are shown at the left. Can you conclude 
that there is a difference in the mean annual earnings based on level of 
education? Use a = 0.01. Assume the populations are normally distributed 
and the population variances are not equal. 


. Identify the claim and state Hy) and H,. 
. Identify the level of significance a and the degrees of freedom. 
Find the critical values and identify the rejection regions. 
. Find the standardized test statistic t. Sketch a graph. 
Decide whether to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
Answer: Page A43 


PmPeanrsp 
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See TI-83/84 Plus 
EXAMPLE 2 Report 35 steps oa pane 479: 
>» A Two-Sample t-Test for the Difference Between Means 
Sample Statistics for A manufacturer claims that the mean calling range (in feet) of its 2.4-GHz 
Calling Range cordless telephones is greater than that of its leading competitor. You perform a 
| Manufacturer Competitor study using 14 randomly selected phones from the manufacturer and 16 randomly 
selected similar phones from its competitor. The results are shown at the left. 
%, = 1275 ft | x, = 1250 ft At a= 0.05, can you support the manufacturer’s claim? Assume the 
5, = 45 ft Ss) = 30ft populations are normally distributed and the population variances are equal. 
= 14 = 16 i 
a a > Solution 


The claim is “the mean calling range of the manufacturer’s cordless phones 
is greater than that of its leading competitor.” So, the null and alternative 
hypotheses are 


Ho: by S po and H,: by > My. (Claim) 
STUDY TIP 
It is important to note that 
when using a TI-83/84 Plus 
for the two-sample t-test, 
select the Pooled: Yes 
input option when the 


Because the variances are equal, df. =n, + ny -2= 144+ 16-2 = 28. 
Because the test is a right-tailed test with df. = 28 and a = 0.05, the critical 
value is fg = 1.701. The rejection region is t > 1.701. The standard error is 


(mn, — 1)s7 + (nm) — 1)s5 1 1 
Sz,-x, = oy ee eae 


variances are equal. n+ ny — 2 nm MN 
2) 4 7) 
_ pe POD) . (EG canine 
144+ 16-2 14 16 


The standardized test statistic is 


X, — X2) — (M1 ~ M2 
= ( ) ( ) Use the t-test. 


SX, —X> 
(1275 — 1250) — 0 
13.8018 


Assume py = [2, SO fy — bo = 0. 


1.811. 


2 


The graph at the left shows the location of the rejection region and the 
standardized test statistic t. Because f¢ is in the rejection region, you should 
decide to reject the null hypothesis. 


Interpretation ‘There is enough evidence at the 5% level of significance to 
support the manufacturer’s claim that its phones have a greater calling range 
than its competitor’s. 


> Try It Yourself 2 


Sample Statistics for A manufacturer claims that the watt usage of its 17-inch flat panel monitors is 
Watt Usage less than that of its leading competitor. You perform a study and obtain the 
| Manufacturer Competitor results shown at the left. At a = 0.10, is there enough evidence to support the 
manufacturer’s claim? Assume the populations are normally distributed and 
X, = 32 X2 = 35 the population variances are equal. 
omg yaa a. Identify the claim and state Hy and H,. 
ie ie") b. Identify the level of significance a and the degrees of freedom. 
c. Find the critical value and identify the rejection region. 
d. Find the standardized test statistic t. Sketch a graph. 
e. Decide whether to reject the null hypothesis. 
f. Interpret the decision in the context of the original claim. 


Answer: Page A43 
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IE) EXERCISES 


ct 


FOR EXTRA HELP; 


7 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What conditions are necessary in order to use a f-test to test the difference 
between two population means? 


2. Explain how to perform a two-sample t-test for the difference between the 
means of two populations. 


In Exercises 3-8, use Table 5 in Appendix B to find the critical value(s) for the 
indicated alternative hypothesis, level of significance a, and sample sizes n; and 
nz. Assume that the samples are independent, normal, and random, and that the 
population variances are (a) equal and (b) not equal. 


3. Hy: by # by, a = 0.10, n, = 11, m = 14 
4. Hy: wy > po, a = 0.01, ny = 12, my = 15 
5. Ay: by < po, a = 0.05, nm, = 7, m2. = 11 
6. Hy: by # bo, a = 0.01, ny = 19, ny = 22 
7. Ay: by > bo, a = 0.05, 1, = 13, m = 8 
8. Hy: by < po, a = 0.10, n, = 9, nm = 4 


In Exercises 9-12, (a) find the test statistic, (b) find the standardized test statistic, 
(c) decide whether the standardized test statistic is in the rejection region, and 
(d) decide whether you should reject or fail to reject the null hypothesis. Assume 
the populations are normally distributed. 


9. Claim: mw, = wo; a = 0.01. 
Sample statistics: X, = 33.7, 5; = 3.5, 
ny, = 12 and X) = 35.5, 8) = 2.2, ny = 17. 
Assume oj = 05. 
—_— a ae t 


t 
oe 


10. Claim: p; < py; a = 0.10. 
Sample statistics: X,; = 0.345, 5; = 0.305, 
n, = 11 and xX, = 0.515, s, = 0.215, ny = 9. 
Assume oj = 05. 
t 


-3 2 fi 0 1 2 3 
ty =—1.33 
11. Claim: pp; S py; a = 0.05. 
Sample statistics: X¥; = 2410, s; = 175, 
n, = 13 and xX, = 2305, 5. = 52, ny = 10. 
Assume oj # 0%. 
a_i t 
3322 <1> 0.64 [2 3 
ty = 1.833 


12. Claim: pw, > po; a = 0.01. 
Sample statistics: x; = 52 5s; = 4.8, 
ny = 16 and X> = 50, sy = 122. Ny = 14. 
Assume oj # 0%. 
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M@ USING AND INTERPRETING CONCEPTS 


Testing the Difference Between Two Means Jn Exercises 13-22, 
(a) identify the claim and state Hy and H,, (b) find the critical value(s) and identify 
the rejection region(s), (c) find the standardized test statistic t, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. If convenient, use technology to solve the problem. In 
each exercise, assume the populations are normally distributed, and the samples are 
independent and random. 


13. 


14. 


Dogs and Cats A pet association claims that the mean annual costs of 
routine veterinarian visits for dogs and cats are the same. The results for 
samples of the two types of pets are shown below. At a = 0.10, can you 
reject the pet association’s claim? Assume the population variances are not 
equal. (Adapted from American Pet Products Association) 


Sample Statistics for Annual 
Routine Vet Visits 


s+ $28 S32 > $15 
n= 7 Ny = 10 


Maximal Oxygen Consumption The maximal oxygen consumption is a way 
to measure the physical fitness of an individual. It is the amount of oxygen in 
milliliters a person uses per kilogram of body weight per minute. A medical 
research center claims that athletes have a greater mean maximal oxygen 
consumption than non-athletes. The results for samples of the two groups are 
shown below. At a = 0.05, can you support the research center’s claim? 
Assume the population variances are equal. 


Sample Statistics for Maximal 
Oxygen Consumptions 


x, = 56 mi/kg/min = xX = 47 mi/kg/min 
Ss; = 4.9 mi/kg/min sy = 3.1 ml/kg/min 


ny, = 23 Ny = 21 


. Bumper Repair Cost In low speed crash tests, the mean bumper repair cost 


of 6 randomly selected mini cars is $1621 with a standard deviation of 
$493. In similar tests of 16 randomly selected midsize cars, the mean bumper 
repair cost is $1895, with a standard deviation of $648. At a = 0.10, can you 
conclude that the mean bumper repair cost is less for mini cars than for 
midsize cars? Assume the population variances are equal. (Adapted from 
Insurance Institute for Highway Safety) 


. Footwell Intrusion An insurance actuary claims that the mean footwell 


intrusions for small pickups and small SUVs are equal. Crash tests at 40 miles 
per hour were performed on 7 randomly selected small pickups and 13 
randomly selected small SUVs. The amount that the footwell intruded on the 
driver’s side was measured. The mean footwell intrusion for the small 
pickups was 11.18 centimeters with a standard deviation of 4.53. The mean 
footwell intrusion for the small SUVs was 9.52 centimeters with a standard 
deviation of 3.84. At a = 0.01, can you reject the insurance actuary’s claim? 
Assume the population variances are equal. (Adapied from Insurance Institute 
for Highway Safety) 
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17. 


18. 


s 
4 


Annual Income A personnel director from Pennsylvania claims that the 
mean household income is greater in Allegheny County than it is in Erie 
County. In Allegheny County, a sample of 19 residents has a mean household 
income of $48,800 and a standard deviation of $8800. In Erie County, a 
sample of 15 residents has a mean household income of $44,000 and a 
standard deviation of $5100. At a = 0.05, can you support the personnel 
director’s claim? Assume the population variances are not equal. (Adapied 
from U.S. Census Bureau) 


Annual Income A personnel director from Florida claims that the mean 
household income is greater in Hillsborough County than it is in Polk County. 
In Hillsborough County, a sample of 17 residents has a mean household 
income of $49,800 and a standard deviation of $4200. In Polk County, a 
sample of 18 residents has a mean household income of $44,400 and a 
standard deviation of $8600. At a = 0.01, can you support the personnel 
director’s claim? Assume the population variances are not equal. (Adapied 
from U.S. Census Bureau) 


19. Tensile Strength The tensile strength of a metal is a measure of its 
ability to resist tearing when it is pulled lengthwise. A new experimental 
type of treatment produced steel bars with the following tensile 
strengths (in newtons per square millimeter). 


Experimental Method: 
391 383 333 378 368 
401 339 376 366 348 


The old method produced steel bars with the following tensile strengths 
(in newtons per square millimeter). 


Old Method: 
362 382 368 398 381 391 400 
410 396 411 385 385 395 


At a = 0.01, does the new treatment make a difference in the tensile 
strength of steel bars? Assume the population variances are equal. 


20. Tensile Strength An engineer wants to compare the tensile strengths 
of steel bars that are produced using a conventional method and an 
experimental method. (The tensile strength of a metal is a measure of its 
ability to resist tearing when pulled lengthwise.) To do so, the engineer 
randomly selects steel bars that are manufactured using each method 
and records the following tensile strengths (in newtons per square 
millimeter). 


Experimental Method: 
395 389 421 394 407 411 389 402 422 
416 402 408 400 386 411 405 389 


Conventional Method: 
362 352 380 382 413 384 400 
378 419 379 384 388 372 383 


At a= 0.10, can the engineer claim that the experimental method 
produces steel with greater mean tensile strength? Should the engineer 
recommend using the experimental method? Assume the population 
variances are not equal. 
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" 21. Teaching Methods A new method of teaching reading is being tested 
on third grade students. A group of third grade students is taught using 
the new curriculum. A control group of third grade students is taught 
using the old curriculum. The reading test scores for the two groups are 

shown in the back-to-back stem-and-leaf plot. 


Old Curriculum New Curriculum 
9 | 3 Key: 

99} 4) 3 9|4 = 49 (old curriculum) 
98843321]5]24 4|3 = 43 (new curriculum) 
76422100/6]011477777899 

7101123349 
8 | 24 


At a = 0.10, is there enough evidence to conclude that the new method 
of teaching reading produces higher reading test scores than the old 
method does? Would you recommend changing to the new method? 
Assume the population variances are equal. 


" 22. Teaching Methods ‘Two teaching methods and their effects on science 
test scores are being reviewed. A group of students is taught in traditional 
lab sessions. A second group of students is taught using interactive 
simulation software. The science test scores for the two groups are 

shown in the back-to-back stem-and-leaf plot. 


Traditional Lab Interactive Simulation Software 
4/6 Key: 
99887663210|7|0455778 0|9 = 90 (traditional) 
98511100] 8 |003478899 9]1 = 91 (interactive) 
20)9}|139 


At a = 0.05, can you support the claim that the mean science test score 
is lower for students taught using the traditional lab method than it is 
for students taught using the interactive simulation software? Assume 
the population variances are equal. 


In Exercises 23-26, use StatCrunch to help you test the claim about the 
difference between two population means j, and 2 at the given level of 
significance a using the given sample statistics. Assume that the populations are 
normally distributed, and the samples are independent and random. 


23. Claim: w; = p2; a = 0.10. Sample statistics: xX; = 186, 5; = 38, m, = 15 and 
X, = 194, s, = 44, n) = 9. Assume of = 0%. 

24. Claim: w, # 2; a = 0.01. Sample statistics: ¥, = 34, 5; = 5.8, n; = 5 and 
% = 45, & = 4.6, m% = 8. Assume oj = 03. 

25. Claim: w; < p23; a = 0.05. Sample statistics: xX; = 840, 5; = 95, nm, = 14 and 
X = 883, s, = 58, n. = 23. Assume ot # 4. 

26. Claim: py = p2; a = 0.10. Sample statistics: X; = 98.5, 5; = 10.2, n, = 15 
and X, = 76.1, s, = 18.8, n) = 6. Assume oj # 05. 
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Sample Statistics for 
Kidney Transplants 


xX; = 1805 days | xX, = 1629 days 
5, = 166 days 5. = 204 days 


n= 21 ny = 11 
TABLE FOR EXERCISE 27 


Sample Statistics for 
Driving Distances 


Ss, = 6yd S. = 12 yd 


ny =9 m= 5 


TABLE FOR EXERCISE 29 


Sample Statistics for 
African Elephant Lifespans 


X, = 56.0 yr xX, = 16.9 yr 

Ss, = 8.6 yr S. = 3.8 yr 

n= 20 ny = 12 
TABLE FOR EXERCISE 30 


M@ EXTENDING CONCEPTS 


Constructing Confidence Intervals for py — m2 If the sampling 
distribution for x; — X2 is approximated by a t-distribution and the populations 
have equal variances, you can construct a confidence interval for 4; — [2 by using 
the following. 


& 1 1 . 1 I 
<2) he? 2 yo =< ey ey ee 
(i= Xa) = ee ao ue te (Xj 2p) oF ie oe 

mae: 


In Exercises 27 and 28, construct a confidence interval for wz — 2. Assume the 
populations are approximately normal with equal variances. 


27. Kidney Transplant Waiting Times To compare the mean times spent 
waiting for a kidney transplant for two age groups, you randomly select 
several people in each age group who have had a kidney transplant. The 
results are shown at the left. Construct a 95% confidence interval for the 
difference in mean times spent waiting for a kidney transplant for the two 
age groups. (Adapted from Organ Procurement and Transplantation Network) 


28. Heart Transplant Waiting Times To compare the mean times spent waiting 
for a heart transplant for two age groups, you randomly select several people 
in each age group who have had a heart transplant. The results are shown 
below. Construct a 99% confidence interval for the difference in mean times 
spent waiting for a heart transplant for the two age groups. (Adapted from 
Organ Procurement and Transplantation Network) 


Sample Statistics for 
Heart Transplants 


XxX; = 171 days | xX) = 169 days 
Ss, = 85days — s, = 11.5 days 
n= 26 ny = 24 


Constructing Confidence Intervals for py — m2 If the sampling 
distribution for X; — X2 is approximated by a t-distribution and the population 
variances are not equal, you can construct a confidence interval for uy — bz by 
using the following. 


2 2 2 2 
oe Ss Ss ee Ss Ss 
(%1 —%)) —tyft+2<u—-wm<(%-%) +t eile, 

ny nN ny Mm 
and d.f. is the smaller of ny — I and nz — 1 


In Exercises 29 and 30, construct the indicated confidence interval for 1; — [. 
Assume the populations are approximately normal with unequal variances. 


29. Golf To compare the mean driving distances for two golfers, you randomly 
select several drives from each golfer. The results are shown at the left. 
Construct a 90% confidence interval for the difference in mean driving 
distances for the two golfers. 


30. Elephants To compare the mean lifespans of African elephants in the wild 
and in a zoo, you randomly select several lifespans from both locations. The 
results are shown at the left. Construct a 95% confidence interval for the 
difference in mean lifespans of elephants in the wild and in a zoo. (Adapted 
from Science Magazine) 
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Testing the Difference Between Means 
(Dependent Samples) 


WHAT YOU SHOULD LEARN The t-Test for the Difference Between Means 


> THE t-TEST FOR THE DIFFERENCE BETWEEN MEANS 


> How to perform a t-test to 


test the mean of the In Sections 8.1 and 8.2, you performed two-sample hypothesis tests with 
differences for a population independent samples using the test statistic x; — X, (the difference between the 
of paired data means of the two samples). To perform a two-sample hypothesis test with 


dependent samples, you will use a different technique. You will first find the 
difference d for each data pair: 


d =X, — Xo. Difference between entries for a data pair 


The test statistic is the mean d of these differences 


A= 2d Mean of the differences between paired 


n data entries in the dependent samples 


The following conditions are required to conduct the test. 
1. The samples must be randomly selected. 

2. The samples must be dependent (paired). 

3. Both populations must be normally distributed. 


If these requirements are met, then the sampling distribution for d, the mean 
of the differences of the paired data entries in the dependent samples, is 
approximated by a ¢-distribution with n — 1 degrees of freedom, where zn is the 
number of data pairs. 


9 Hg e 


The following symbols are used for the t-test for ug. Although formulas are 
given for the mean and standard deviation of differences, you should use a 
technology tool to calculate these statistics. 


n The number of pairs of data 
d The difference between entries for a data pair, d = x; — x 
STUDY TIP La The hypothesized mean of the differences of paired data in 
You can also calculate the the population 
standard deviation of the d The mean of the differences between the paired data entries 
differences between paired in the dependent samples 
data entries using the F 
shortcut formula a= 2d 
n 
me: - , 
SEF = Sq The standard deviation of the differences between the paired 
a eee data entries in the dependent samples 
= il = 
‘i S(d— dy 
sq = ———_—__—_—_——— 
n= 1 
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The manufacturer of an appetite 
suppressant claims that when its 
product is taken while following 
a low-fat diet with regular 
exercise for 4 months, the 
average weight loss is 20 pounds. 
To test this claim, you studied 12 
randomly selected dieters taking 
an appetite suppressant for 
4 months. The dieters followed 
a low-fat diet with regular 
exercise all 4 months. The results 
are shown in the following table. 
(Adapted from NetHealth, Inc.) 

Weights (in pounds) 

of 12 Dieters 


168 
177 
196 
180 
229 


197 
252 
161 
192 178 
181 161 
209 193 


1 
2 
3 
4 
5 
6 
7 
8 
9 


ana 
Ny FF © 


At a = 0.10, does your study 
provide enough evidence to 
reject the manufacturer’s claim? 
Assume the weights are 
normally distributed. 


Presented by: https://jafrilibrary.org 


HYPOTHESIS TESTING WITH TWO SAMPLES 


When you use a f-distribution to approximate the sampling distribution for 
d, the mean of the differences between paired data entries, you can use a f-test to 
test a claim about the mean of the differences for a population of paired data. 


t-TEST FOR THE DIFFERENCE BETWEEN MEANS 


A t-test can be used to test the difference of two population means when 
a sample is randomly selected from each population. The requirements for 
performing the test are that each population must be normal and each 
member of the first sample must be paired with a member of the second 
sample. The test statistic is 


and the standardized test statistic is 
s ae Ma 

S al Vn : 
The degrees of freedom are 


Gh, = — Il. 


t 


GUIDELINES 


Using the 7-Test for the Difference Between Means 
(Dependent Samples) 


IN WORDS 
1. State the claim mathematically 


and verbally. Identify the null 
and alternative hypotheses. 


IN SYMBOLS 
State Hy and H,,. 


Identify a. 
Git, = ia = il 


Use Table 5 in Appendix B. 
If n > 29, use the last row 
(co) in the t-distribution 


2. Specify the level of significance. 
3. Determine the degrees of freedom. 


4, Determine the critical value(s). 


table. 
5. Determine the rejection region(s). 
6. Calculate d and s,. A= si 
> (d —- d) 
Sia 
n-1 
bp i rales a= Ma 
7. Find the standardized test statistic t= 
Sql/Vn 


and sketch the sampling distribution. 


If fis in the rejection region, 
reject Hy. Otherwise, fail to 
reject Hp. 


8. Make a decision to reject or fail to 
reject the null hypothesis. 


9. Interpret the decision in the context 
of the original claim. 
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4 
22 25 -3 9 
25 25 0 0 
28 29 =] 1 
35 33 2 4 
32 34 =2 4 
30 35 —5 25 
27 30 -3 9 


STUDY TIP 


You can also use technology and 

a P-value to perform a hypothesis 
test for the difference between 
means. For instance, in Example 1, 
you can enter the data in MINITAB 
(as shown on page 478) 

and find P = 0.026. 

Because P < a, you t 
should decide to reject 

the null hypothesis. 


EXAMPLE 1 
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See MINITAB 
steps on page 478. 


> The t-Test for the Difference Between Means 


A shoe manufacturer claims that athletes can increase their vertical jump 
heights using the manufacturer’s new Strength Shoes®. The vertical jump 
heights of eight randomly selected athletes are measured. After the athletes 
have used the Strength Shoes® for 8 months, their vertical jump heights are 
measured again. The vertical jump heights (in inches) for each athlete are 
shown in the table. At a = 0.10, is there enough evidence to support the 
manufacturer’s claim? Assume the vertical jump heights are normally 
distributed. (Adapted from Coaches Sports Publishing) 


> Solution 

The claim is that “athletes can increase their vertical jump heights.” In other 
words, the manufacturer claims that an athlete’s vertical jump height before 
using the Strength Shoes® will be less than the athlete’s vertical jump height 
after using the Strength Shoes®. Each difference is given by 


d = (jump height before shoes) — (jump height after shoes). 
The null and alternative hypotheses are 
Ao: ta = 0 and A: wag < 0. (Claim) 


Because the test is a left-tailed test, a = 0.10, and df. = 8 —1=7, the 
critical value is fg = —1.415. The rejection region is tf < —1.415. Using the 
table at the left, you can calculate d and sq as follows. Notice that the shortcut 
formula is used to calculate the standard deviation. 


ga 28 oe as 
n 8 


n-1 
The standardized test statistic is 


t= d~ Ha Use the t-test. ! 
sal Vn 
—1.75 — 0 ! 
~ 2.1213/V8 Assume pg = 0. | | ! - 
ae a f2 | 0 1 2 3 


t= —2.333 t,=—1.415 


The graph at the right shows the location of the rejection region and the 
standardized test statistic ¢. Because f is in the rejection region, you should 
decide to reject the null hypothesis. 

Interpretation There is enough evidence at the 10% level of significance to 
support the shoe manufacturer’s claim that athletes can increase their vertical 
jump heights using the new Strength Shoes®. 
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4.85 4.78 
4.90 4.90 
5.08 5.05 
4.72 4.65 
4.62 4.64 
4.54 4.50 
5.25 5.24 
5.18 5.27 
4.81 4.75 
4.57 4.43 
4.63 4.61 
4.77 4.82 
STUDY TIP 


If you prefer to use a 
technology tool for this 
type of test, enter the 
data in two columns and 
form a third column in 
which you calculate the 
difference for each pair. 
You can now perform a 
one-sample t-test on the 
difference column, as shown 
in Chapter 7. 


N 
4 
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> Try It Yourself 1 


A shoe manufacturer claims that athletes can decrease their times in the 
40-yard dash using the manufacturer’s new Strength Shoes®. The 40-yard dash 
times of 12 randomly selected athletes are measured. After the athletes have 
used the Strength Shoes® for 8 months, their 40-yard dash times are measured 
again. The times (in seconds) are listed at the left. At a = 0.05, is there enough 
evidence to support the manufacturer’s claim? Assume the times are normally 
distributed. (Adapted from Coaches Sports Publishing) 


. Identify the claim and state Hy and H,. 

. Identify the level of significance a and the degrees of freedom. 

. Find the critical value ty and identify the rejection region. 

. Calculate d and sy. 

. Use the t-test to find the standardized test statistic t. Sketch a graph. 
Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 


memonn & & 


Answer: Page A43 


Note that in Example 1 it is possible that the vertical jump height improved 
because of other reasons. Many advertisements misuse statistical results by 
implying a cause-and-effect relationship that has not been substantiated by testing. 


EXAMPLE 2 G@® Report 36 


> The t-Test for the Difference Between Means 


A state legislator wants to determine whether her performance rating (0-100) 
has changed from last year to this year. The following table shows the 
legislator’s performance ratings from the same 16 randomly selected voters for 
last year and this year. At a = 0.01, is there enough evidence to conclude that 
the legislator’s performance rating has changed? Assume the performance 
ratings are normally distributed. 


> Solution 
If there is a change in the legislator’s rating, there will be a difference between 


90? 


“this year’s” ratings and “last year’s” ratings. Because the legislator wants to 
see if there is a difference, the null and alternative hypotheses are 


Ho: Ma = 0 and 


Because the test is a two-tailed test, a = 0.01, and df. = 16 — 1 = 15, the 
critical values are —t) = —2.947 and ty = 2.947. The rejection regions are 
t < —2.947 and t > 2.947. 


Ay ba 0. (Claim) 
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‘Before After dog? Using the table at the left, you can calculate d and sy as shown below. 
= Yd _ 53 

60 56 4 16 d= ae 3.3125 
54 48 6 36 
78 70 8 64 
84 60 24 576 
91 85 6 36 
25 40 -15 225 1581 — 53? 
50 40 10 100 = 16 = 9.6797 
65 55 10 100 rod 
68 80 -12 144 The standardized test statistic is 
8175 6 36 aii 
75 78 3 9 t= sin Use the t-test. 
45 50 —5 25 
62 50 12 144 geo Jes by = 0. 
aa. || tee ie a2 9.6797/V16 
58 53 5 25 = 1.369. 
63 60 3 9 The graph at the right shows the 

>= 53 | S = 1581 location of the rejection region and the 


standardized test statistic t. Because ris 
not in the rejection region, you should 
fail to reject the null hypothesis. 


Interpretation There is not enough 
evidence at the 1% level of significance 
to conclude that the legislator’s 
performance rating has changed. 


> Try It Yourself 2 


A medical researcher wants to determine whether a drug changes the body’s 
temperature. Seven test subjects are randomly selected, and the body 
temperature (in degrees Fahrenheit) of each is measured. The subjects are 
then given the drug and, after 20 minutes, the body temperature of each is 
measured again. The results are listed below. At a = 0.05, is there enough 
evidence to conclude that the drug changes the body’s temperature? Assume 
the body temperatures are normally distributed. 


1 2 3 4 5 6 7 


101.8 985 98.1 99.4 98.9 | 100.2 97.9 


99.2 98.4 98.2 99 98.6 99.7 97.8 


. Identify the claim and state Hy and H,. 

. Identify the level of significance a and the degrees of freedom. 
Find the critical values and identify the rejection regions. 

. Calculate d and sy. 

Use the t-test to find the standardized test statistic t. Sketch a graph. 
Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 


wemeanaep 


Answer: Page A43 


Presented by: https://jafrilibrary.org 


456 CHAPTER 8 


ESD EXERCISES 


= 


FOR EXTRA HELP; 


Fy 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What conditions are necessary in order to use the dependent samples 
t-test for the mean of the difference of two populations? 


2. Explain what the symbols d and s, represent. 


In Exercises 3-8, test the claim about the mean of the difference of two populations. 
Use a t-test for dependent, random samples at the given level of significance with 
the given statistics. Is the test right-tailed, left-tailed, or two-tailed? Assume the 
populations are normally distributed. 


3. Claim: pg < 0; a = 0.05. Statistics: d = 1.5, sg = 3.2,n = 14 
4, Claim: pg = 0; a = 0.01. Statistics: d = 3.2, sy = 8.45,n = 8 
5. Claim: wg <= 0;a = 0.10. Statistics: d = 6.5, sy = 9.54,n = 16 
6. Claim: ug > 0; a = 0.05. Statistics: d = 0.55, sy = 0.99, n = 28 
7. Claim: pg = 0; @ = 0,01, Statistics: d =-—2.3,sy = 12,n—= 15 
8. Claim: pg # 0;a = 0.10. Statistics: d = —1, sy = 2.75,n = 20 


M@ USING AND INTERPRETING CONCEPTS 


Testing the Difference Between Two Means In Exercises 9-18, 
(a) identify the claim and state Hy and H,, (b) find the critical value(s) and 
identify the rejection region(s), (c) calculate d and s4, (d) find the standardized test 
Statistic t, (e) decide whether to reject or fail to reject the null hypothesis, and 
(f) interpret the decision in the context of the original claim. If convenient, use 
technology to solve the problem. For each randomly selected sample, assume the 
population is normally distributed. 


9. Grammatical Errors A teacher claims that a grammar seminar will help 
students reduce the number of grammatical errors they make when writing 
a 1000-word essay. The table shows the number of grammatical errors made 
by seven students before participating in the seminar and after participating 
in the seminar. At a = 0.01, is there enough evidence to conclude that the 
seminar reduced the number of errors? 


* 10. SAT Scores An SAT preparation course claims to improve the 
test scores of students. The table shows the critical reading scores for 
10 students the first two times they took the SAT. Before taking the 
SAT for the second time, the students took a course to try to improve 
their critical reading SAT scores. Test the claim at a = 0.01. 


1 2 3 4 5 6 yi 8 9 10 


308 456 352 433 306) 471 422) «370 «320. 418 
400 524 409 491 348 583 451 408 391 450 
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11. Losing Weight A nutritionist claims that a particular exercise program 
will help participants lose weight after one month. The table shows the 
weights of 12 adults before participating in the exercise program and 
one month after participating in the exercise program. At a = 0.10, can 
you conclude that the exercise program helps participants lose weight? 


232 


188 


206 


10 


140 


138 


215 


11 


137 


145 


169 


‘ 12. Batting Averages A coach suggests that a baseball clinic will help 
players raise their batting averages. The table shows the batting 
averages of 14 players before participating in the clinic and two months 
after participating in the clinic. At a = 0.05, is there enough evidence to 
conclude that the clinic helped the players raise their batting averages? 


0.380 


0.380 


0.316 


0.315 


0.300 


0.280 


0.310 0.302 


0.298 


12 


0.270 | 0.300 


0.282 


0.325 | 0.256 

0.330 0.260 
13 | 14 | 

0.330 0.340 


0.336 0.325 


‘, 13. Headaches A physical therapist suggests that soft tissue therapy and 
spinal manipulation help to reduce the lengths of time patients suffer 
from headaches. The table shows the number of hours per day 11 patients 
suffered from headaches before and after 7 weeks of receiving treatment. 
At a = 0.01, is there enough evidence to support the therapist’s claim? 
(Adapted from The Journal of the American Medical Association) 


1 


N 
Ww 
& 
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28 24 > 28 > 26 > 2.7) 29) 32 > 29 41 


10. 11 


1.6 | 2.5 


16° 13°16 ~= 14 > 15 16) 1.7) 16/18/12 1.4 


458 
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+ 


15. 


16. 


~ 14. Grip Strength A physical therapist suggests that one 600-mg dose of 
Vitamin C will increase muscular endurance. The table shows the number 
of repetitions 15 males made on a hand dynamometer (measures grip 
strength) until the grip strengths in three consecutive trials were 50% of 
their maximum grip strength. At a = 0.05, test the claim that Vitamin C 
will increase muscular endurance. (Adapted from Journal of Sports 
Medicine and Physical Fitness) 


— 
N 
w 
BS 
nN 
an 
— 
i) 


145 185 387 | 593. 248 = 245 349-902 


363-258 | 288 526 180 


159 | 122 264 


172 | 278 


1052-218 | 117) 185 


Blood Pressure A pharmaceutical company guarantees that its new drug 
reduces systolic blood pressure. The table shows the systolic blood pressures 
(in millimeters of mercury) of eight patients before taking the new drug and 
two hours after taking the drug. At a = 0.05, can you conclude that the new 
drug reduces systolic blood pressure? 


2/3] 4 
171 | 186 | 162 
165 | 167 | 155 


167 175 «148 


148 } 144 | 152 | 134 


Plaque Thickness 
of plaque buildup in arteries. The table shows the thicknesses (in millimeters) 
of plaque in the carotid arteries of nine patients with mild atherosclerosis 
before taking garlic on a daily basis and after four years of taking garlic on a 
daily basis. At a = 0.05, can you conclude that garlic reduces the thickness 
of plaque buildup? 


A researcher believes that garlic can reduce the thickness 


1 2/3 | 4 | 5 6 | 7 | 8 9 


0.78 | 0.65 0.73 0.85 0.68 0.80 0.64 | 0.72 0.82 
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Product Ratings A company wants to determine whether its consumer 
product ratings (0-10) have changed from last year to this year. The table 
shows the company’s product ratings from the same eight consumers for last 
year and this year. At a = 0.05, is there enough evidence to conclude that the 
product ratings have changed? 


5 |7)2}3 ]9 | 10) 8 | 7 
5/|/9)/4/)/6;)9}9 |9 | 8 


_ 18. Points Per Game ‘The scoring averages (in points per game) of 10 


professional basketball players for their rookie and sophomore seasons 
are shown in the table below. At a = 0.10, is there enough evidence to 
conclude that the scoring averages have changed? (Source: National 
Basketball Association) 


17.5 14.7 16.9 163 204 189 146 63 142 125 


1 2 3 4 5 6 7 8 9 10 
18.5 | 13.9 | 16.1 | 15.3 | 16.8 | 13.0 | 11.9 | 11.8 |) 11.1 | 11.1 


@® In Exercises 19 and 20, use StatCrunch to help you test the claim about the 
difference between two population means. For each randomly selected sample, 
assume the population is normally distributed. 


19. 


Cholesterol Levels A food manufacturer claims that eating its new cereal 
as part of a daily diet lowers total blood cholesterol levels. The table shows 
the total blood cholesterol levels (in milligrams per deciliter of blood) of 
seven patients before eating the cereal and after one year of eating the 
cereal as part of their diets. At a = 0.05, can you conclude that the new 
cereal lowers total blood cholesterol levels? 


. Obstacle Course On a television show, eight contestants try to lose the 


highest percentage of weight in order to win a cash prize. As part of the show, 
the contestants are timed as they run an obstacle course. The table shows the 
times (in seconds) of the contestants at the beginning of the 
season and at the end of the season. At a = 0.01, is there enough evidence 
to conclude that the contestants’ times have changed? 


1 2 3 4 5 6 7 8 
130.2 | 104.8 | 100.1 136.4 125.9 | 122.6 150.4 158.2 


121.5 100.7 | 90.2 | 135.0 | 112.1 | 120.5 139.8 | 142.9 
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mM EXTENDING CONCEPTS 


21. In Exercise 15, use a P-value to perform the hypothesis test. Compare your 


result with the result obtained using rejection regions. Are they the same? 


22. In Exercise 18, use a P-value to perform the hypothesis test. Compare your 


result with the result obtained using rejection regions. Are they the same? 


Constructing Confidence Intervals for jg To construct a confidence 
interval for ja, use the following inequality. 


d —t, <pqa<dtt, 


‘Va Va 


In Exercises 23 and 24, construct the indicated confidence interval for ug. Assume 


the populations are normally distributed. 


" 23. Drug Testing A sleep disorder specialist wants to test the effectiveness 
of a new drug that is reported to increase the number of hours of sleep 
patients get during the night. To do so, the specialist randomly selects 
16 patients and records the number of hours of sleep each gets with and 
without the new drug. The results of the two-night study are listed 
below. Construct a 90% confidence interval for py. 


2.0 


Sill 


6.6 


confidence interval for py. 


3.4 


5.2 


718 


14. 


3:5 


6.0 
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3.5 


5.0 


7.2 


3.4 


5.8 


6.5 


3.7 


4.5 


6.5 


3.7 


4.2 


4.4 


3.8 


4.2 


5.6 


5.1 


4.8 


4.7 


3.9 


4.7 


3.9) 


5.1 


2.9 


mil 


3.9 


5.7 


5.2 


4.5 


4.7 


4.0 


6.2 


‘", 24. Herbal Medicine Testing An herbal medicine is tested on 14 randomly 
selected patients with sleeping disorders. The table shows the number of 
hours of sleep patients got during one night without using the herbal 
medicine and the number of hours of sleep the patients got on another 
night after the herbal medicine had been administered. Construct a 95% 
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Testing the Difference Between Proportions 


WHAT YOU SHOULD LEARN Two-Sample z-Test for the Difference Between Proportions 


> How to perform a z-test for 
the difference between two 
population proportions p, 


and p> 


STUDY TIP 


The following symbols are used 
in the z-test for p,; — pz. See 
Sections 4.2 and 5.5 to review 
the binomial distribution. 


P1; P2 


X14, X2 


ny, N2 


Pi, Po 


cI 


Population 
proportions 


Number of 
successes in 
each sample 


Size of each 
sample 


Sample 
proportions 
of successes 


Weighted 
estimate for 


Pp, and pz 


> TWO-SAMPLE z-TEST FOR THE DIFFERENCE 
BETWEEN PROPORTIONS 


In this section, you will learn how to use a z-test to test the difference between 
two population proportions p, and p, using a sample proportion from each 
population. If a claim is about two population parameters p; and p>, then some 
possible pairs of null and alternative hypotheses are 


Ho Pi= Po Jo P= Pr og ee Pi = Pr 

Ay: py * py | Ha pi > Pr Ay: Pi < Pr 
Regardless of which hypotheses you use, you always assume there is no difference 
between the population proportions, or py = p>. 

For instance, suppose you want to determine whether the proportion of 
female college students who earn a bachelor’s degree in four years is different 
from the proportion of male college students who earn a bachelor’s degree in 
four years. The following conditions are necessary to use a z-test to test such a 
difference. 

1. The samples must be randomly selected. 

2. The samples must be independent. 

3. The samples must be large enough to use a normal sampling distribution. 
That is, n; py = 5,1,q, = 5,nop2 = 5, and ny q, = 5. 

If these conditions are met, then the sampling distribution for p, — p2, the 

difference between the sample proportions, is a normal distribution with mean 


Mp,-B, = Pi — P2 


and standard error 


(Pig P2902 
05,-p, = + - 
1 2 ny 15) 


Notice that you need to know the population proportions to calculate the 
standard error. Because a hypothesis test for py — pz is based on the assumption 
that p; = p2, you can calculate a weighted estimate of p,; and p, using 


_ xy +f Po) h K d A 
= ———, where x, = 1, p; and x7 = ny pp. 
P my + 1 1 1P1 2 2P2 
With the weighted estimate p, the standard error of the sampling distribution for 
Pi — pris 


a 1 _ = 
op-5, = pal + 1), where g = 1 — Pp. 


Also observe that you need to know the population proportions to verify that the 
samples are large enough to be approximated by the normal distribution. But 
when determining whether the z-test can be used for the difference between 
proportions for a binomial experiment, you should use p in place of p, and p> 
and use q in place of g; and q. 
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A medical research team 
conducted a study to test 
whether a drug lowers the 
chance of getting diabetes. 

In the study, 2623 people took 
the drug and 2646 people took 
a placebo. The results are shown 
below. (Source: The New England Journal 
of Medicine) 


Got 
Diabetes 


Drug Placebo 


At a = 0.05, can you support 
the claim that the drug lowers 
the chance of getting diabetes? 
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HYPOTHESIS TESTING WITH TWO SAMPLES 


If the sampling distribution for p, — p is normal, you can use a two-sample 
z-test to test the difference between two population proportions p; and p>. 


TWO-SAMPLE z-TEST FOR THE DIFFERENCE 
BETWEEN PROPORTIONS 


A two-sample z-test is used to test the difference between two population 
proportions p, and p, when a sample is randomly selected from each 
population. The test statistic is p, — p>, and the standardized test statistic is 


(P. — Po) — (pi — Pr) 


Note: 11), 1g, nop, and nog must be at least 5. 


If the null hypothesis states py = po, py S po, oF py = po, then p; = pp is 
assumed and the expression p; — p> is equal to 0 in the preceding test. 


GUIDELINES 


Using a Two-Sample z-Test for the Difference Between Proportions 


IN WORDS 


. State the claim mathematically 


and verbally. Identify the null 
and alternative hypotheses. 


. Specify the level of significance. 
. Determine the critical value(s). 
. Determine the rejection region(s). 


. Find the weighted estimate of 


Pp, and pp. Verify that np, 11q, 
ny p, and nog are at least 5. 


. Find the standardized test statistic 


and sketch the sampling distribution. 


. Make a decision to reject or fail to 


reject the null hypothesis. 


IN SYMBOLS 
State Hy and H,. 


Identify a. 
Use Table 4 in Appendix B. 


X1 + Xo 


is etary, 


(Pi — Po) — (Pi — D2) 


rE) 
Pq ny No 


If z is in the rejection region, 
reject Hy. Otherwise, fail to 


reject Hy. 
8. Interpret the decision in the context 
of the original claim. 


A hypothesis test for the difference between proportions can also be 
performed using P-values. Use the guidelines listed above, skipping Steps 3 and 4. 
After finding the standardized test statistic, use the Standard Normal Table to 
calculate the P-value. Then make a decision to reject or fail to reject the null 
hypothesis. If P is less than or equal to a, reject Hp. Otherwise, fail to reject Hp. 
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STUDY TIP ra 
To find x; and x, use bya 


X= NP; and x2 = nz Pp. 


Sample Statistics for 


Vehicles 
n= 150 ng = 200 
DP, = 0.86 Po = 0.74 
x, = 129 X> = 148 


See TI-83/84 Plus 
EXAMPLE 1 steps on page 479. 
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>» A Two-Sample z-Test for the Difference Between Proportions 


A study of 150 randomly selected occupants in passenger cars and 200 
randomly selected occupants in pickup trucks shows that 86% of occupants in 
passenger cars and 74% of occupants in pickup trucks wear seat belts. At 
a = 0.10, can you reject the claim that the proportion of occupants who wear 
seat belts is the same for passenger cars and pickup trucks? (Adapied from 
National Highway Traffic Safety Administration) 


> Solution 

The claim is “the proportion of occupants who wear seat belts is the same for 

passenger cars and pickup trucks.” So, the null and alternative hypotheses are 
Ho: py = Po (Claim) and H,: py # Po. 


Because the test is two-tailed and the level of significance is a = 0.10, 
the critical values are —z) = —1.645 and zp = 1.645. The rejection regions 
are z < —1.645 and z > 1.645. The weighted estimate of p, and p, is 


_ tm _ 129+ 148 _ 277 
m+n, 150+ 200 350 


= 0.7914 


and 
g=1-pr1—- 0.7914 = 0.2086. 


Because np ~ 150(0.7914), mq © 150(0.2086), nz p ~ 200(0.7914), and 
nog © 200(0.2086) are at least 5, you can use a two-sample z-test. The 
standardized test statistic is 


(p (0.86 — 0.74) — 0 
pe ee 


Se ) 
ipa ++) i (0.7914)(0.2086)( = + a) 


The graph at the left shows the location of the rejection regions and the 
standardized test statistic. Because z is in the rejection region, you should 
decide to reject the null hypothesis. 


= 2.73. 


Interpretation There is enough evidence at the 10% level of significance to 
reject the claim that the proportion of occupants who wear seat belts is the 
same for passenger cars and pickup trucks. 


> Try It Yourself 1 


Consider the results of the NYTS study discussed in the Chapter Opener. At 
a = 0.05, can you support the claim that there is a difference between the 
proportion of male high school students who smoke cigarettes and the 
proportion of female high school students who smoke cigarettes? 


. Identify the claim and state Hy and H,. 
. Identify the level of significance a. 
. Find the critical values and identify the rejection regions. 
. Find p and q. 
. Verify that np, n,g, n2p, and nog are at least 5. 
Find the standardized test statistic z. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 
Answer: Page A44 


Sr moman a 
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EXAMPLE 2 


>» A Two-Sample z-Test for the Difference Between Proportions 


A medical research team conducted a study to test the effect of a 
cholesterol-reducing medication. At the end of the study, the researchers found 
that of the 4700 randomly selected subjects who took the medication, 301 died 
of heart disease. Of the 4300 randomly selected subjects who took a placebo, 
357 died of heart disease. At a = 0.01, can you support the claim that the death 
rate due to heart disease is lower for those who took the medication than for 
those who took the placebo? (Adapted from The New England Journal of Medicine) 


> Solution 


The claim is “the death rate due to heart disease is lower for those who 
took the medication than for those who took the placebo.” So, the null and 
alternative hypotheses are 


STUDY TIP 
To find 6, and pf, use 


eae x eX. : 
jo = 7 and pz = ae Ho: py = P2 and Hy: Py < pz. (Claim) 
Because the test is left-tailed and the level of significance is a = 0.01, the 
critical value is z) = —2.33. The rejection region is z < —2.33. The weighted 


estimate of p,; and p, is 


xX, + X2 301 + 357 658 


in +m 4700 + 4300 — 9000 ~ 20731 
Sample Statistics for and 
Cholesterol-Reducing 
Medication g=1- p21 -— 0.0731 = 0.9269. 
Because n,p = 4700(0.0731), nq = 4700(0.9269), n2p = 4300(0.0731), and 
nog = 4300(0.9269) are at least 5, you can use a two-sample z-test. 
ny = 4700 ny = 4300 (Bi— Bo) ~ (pig P2) (0.064 — 0.083) — 0 
x, = 301 x) = 357 ae ae x i a = —3.46 
— aS pa—+— 0.0731) (0.9269)| —— + —— 


The graph at the left shows the location of the rejection region and the 
standardized test statistic. Because z is in the rejection region, you should 
decide to reject the null hypothesis. 


Interpretation There is enough evidence at the 1% level of significance to 
support the claim that the death rate due to heart disease is lower for those 
who took the medication than for those who took the placebo. 


> Try It Yourself 2 


Consider the results of the NYTS study discussed in the Chapter Opener. At 
a = 0.05, can you support the claim that the proportion of male high school 
students who smoke cigars is greater than the proportion of female high school 
students who smoke cigars? 


. Identify the claim and state Hy) and H,. 
. Identify the level of significance a. 
. Find the critical value and identify the rejection region. 
. Find p and q. 
. Verify that np, n,q, n2p, and nq are at least 5. 
Find the standardized test statistic z. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 
Answer: Page A44 


Sr moan as & 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What conditions are necessary in order to use the z-test to test the difference 
between two population proportions? 


2. Explain how to perform a two-sample z-test for the difference between two 
population proportions. 


In Exercises 3-8, decide whether the normal sampling distribution can be used. If 
it can be used, test the claim about the difference between two population 
proportions p, and p> at the given level of significance a using the given sample 
statistics. Assume the sample statistics are from independent, random samples. 


3. Claim: p, # po; a = 0.01. 
Sample statistics: x; = 35, n; = 70 and x» = 36, n, = 60 


4. Claim: p; < po; a = 0.05. 
Sample statistics: x; = 471, n; = 785 and x, = 372, ny = 465 


5. Claim: p, = pr; a = 0.10. 
Sample statistics: x; = 42, ny = 150 and x7 = 76, ny = 200 


6. Claim: p, > p2; a = 0.01. 
Sample statistics: x; = 6, m; = 20 and x, = 4, ny, = 30 


7. Claim: p, S p2; a = 0.10. 
Sample statistics: x; = 344, n,; = 860 and x, = 304, n, = 800 


8. Claim: p, = pr; a = 0.05. 
Sample statistics: x; = 29, n; = 45 and x, = 25, n, = 30 


M@ USING AND INTERPRETING CONCEPTS 


Testing the Difference Between Two Proportions Jn Exercises 9-18, 
(a) identify the claim and state Hg and H,, (b) find the critical value(s) and 
identify the rejection region(s), (c) find the standardized test statistic z, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. If convenient, use technology to solve 
the problem. In each exercise, assume the samples are randomly selected and 
independent. 


9. Plantar Heel Pain A medical research team conducted a study to test 
the effect of magnetic insoles for treating plantar heel pain. In the study, 
54 subjects wore magnetic insoles and 41 subjects wore nonmagnetic insoles. 
All subjects wore their insoles for 4 weeks. The results are shown below. 
At a = 0.01, can you support the claim that there is a difference in the 
proportion of subjects who feel all or mostly better after 4 weeks between 
subjects who used magnetic insoles and subjects who used nonmagnetic 
insoles? (Adapted from The Journal of the American Medical Association) 


Do You Feel All or Mostly Better? 


Magnetic Nonmagnetic 
Insoles Insoles 


Presented by: https://jafrilibrary.org 


466 


CHAPTER 8 


Presented by: https://jafrilibrary.org 


HYPOTHESIS TESTING WITH TWO SAMPLES 


10. 


11. 


13. 


14, 


15. 


Cancer Drug A gastrointestinal stromal tumor is a rare form of cancer 
which develops in muscle tissue and blood vessels within the stomach or 
small intestine. A medical research team conducted a study to test the effect 
of a drug on this type of cancer. In the study, 300 subjects took the drug and 
300 subjects took a placebo. All subjects had surgery to remove the tumor 
and then took the drug or placebo for one year. At a = 0.10, can you 
support the claim that the proportion of subjects who are cancer-free after 
one year is greater for subjects who took the drug than for subjects who took 
a placebo? (Adapted from American College of Surgeons Oncology Group) 


Are You Cancer-Free One Year After Surgery? 
No 
Jon? 


Drug Placebo 


Attending College In a survey of 875,000 males who completed high school 
during the past 12 months, 65.8% were enrolled in college. In a survey of 
901,000 females who completed high school during the past 12 months, 
66.1% were enrolled in college. At a = 0.05, can you support the claim that 
the proportion of males who enrolled in college is less than the proportion 
of females who enrolled in college? (Source: National Center for Education 
Statistics) 


. Consumer Spending In a survey of 433 females, 72% have reduced the 


amount they spend on eating out. In a survey of 577 males, 65% have 
reduced the amount they spend on eating out. At a = 0.01, can you reject 
the claim that there is no difference in the proportion of females who have 
reduced the amount they spend on eating out and the proportion of males 
who have reduced the amount they spend on eating out? (Adapted from 
Morpace) 


Migraines A medical research team conducted a study to test the effect of 
a migraine drug. Of the 400 subjects who took the drug, 25% were pain-free 
after two hours. Of the 407 subjects who took a placebo, 10% were pain-free 
after two hours. At a = 0.05, can you reject the claim that the proportion of 
subjects who are pain-free is the same for the two groups? (Adapted from 
International Migraine Pain Assessment Clinical Trial) 


Migraines A medical research team conducted a study to test the effect of 
a migraine drug. Of the 400 subjects who took the drug, 65% were free of 
nausea after two hours. Of the 407 subjects who took a placebo, 53% were 
free of nausea after two hours. At a = 0.10, can you support the claim that 
the proportion of subjects who are free of nausea is greater for subjects 
who took the drug than for subjects who took a placebo? (Adapted from 
International Migraine Pain Assessment Clinical Trial) 


Motorcycle Helmet Use In a survey of 600 motorcyclists, 404 wear a 
helmet. In another survey of 500 motorcyclists taken one year before, 317 
wore a helmet. At a = 0.05, can you support the claim that the proportion 
of motorcyclists who wear a helmet is now greater? (Adapted from National 
Highway Traffic Safety Administration) 
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16. 


17. 


18. 


Motorcycle Helmet Use In a survey of 300 motorcyclists from the 
Northeast, 183 wear a helmet. In a survey of 300 motorcyclists from the 
Midwest, 201 wear a helmet. At a = 0.10, can you support the claim that the 
proportion of motorcyclists who wear a helmet in the Northeast is less than 
the proportion of motorcyclists who wear a helmet in the Midwest? (Adapted 
from National Highway Traffic Safety Administration) 


Internet Users In a survey of 450 adults 18 to 29 years of age, 419 said 
they use the Internet. In a survey of 400 adults 30 to 49 years of age, 324 said 
they use the Internet. At a = 0.01, can you reject the claim that the 
proportion of Internet users is the same for the two age groups? (Adapted 
from Pew Research Center) 


Internet Users Ina survey of 485 adults who live in an urban area, 359 said 
they use the Internet. In a survey of 315 adults who live in a rural area, 221 
said they use the Internet. At a = 0.10, can you support the claim that the 
proportion of adults who use the Internet is greater for adults who live in an 
urban area than for adults who live in a rural area? (Adapted from Pew 
Research Center) 


DMV Wait Times Jn Exercises 19-22, refer to the figure, which shows the 
percentages of customers waiting 20 minutes or less at four district offices of the 
Department of Motor Vehicles (DMV) in Virginia. Assume the survey included 
400 people from each district. (Adapted from Virginia Department of Motor Vehicles) 


19. 


20. 


21. 


22. 


23. 


24. 


Fairfax North and Fairfax ee 
South At a= 0.05, can DMV Wait Times 


‘ ‘ Percentage of customers 
you reject the claim that the | waiting 20 minutes 


proportion of customers | orless 
who wait 20 minutes or less 
is the same at the Fairfax 
North office and the Fairfax 
South office? 


Staunton and Fairfax South 
At a=0.01, can you 
support the claim that the 
proportion of customers 
who wait 20 minutes or less 
is greater at the Staunton 
office than at the Fairfax 
South office? 


Roanoke and Staunton At a = 0.10, can you support the claim that the 
proportion of customers who wait 20 minutes or less at the Roanoke office 
is less than the proportion of customers who wait 20 minutes or less at the 
Staunton office? 


Roanoke and Fairfax North At a = 0.05, can you support the claim that 
there is a difference between the proportion of customers who wait 
20 minutes or less at the Roanoke office and the proportion of customers 
who wait 20 minutes or less at the Fairfax North office? 


Writing Suppose you are testing Exercise 21 at a = 0.01. Do you still make 
the same decision? Explain your reasoning. 


Writing Suppose you are testing Exercise 22 at a = 0.10. Do you still make 
the same decision? Explain your reasoning. 
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(SC) In Exercises 25-28, refer to the figure and use StatCrunch to test the 
claim. Assume the survey included 13,300 men and 13,200 women in 2000 and 
14,500 men and 14,200 women in 2009, and assume the samples are random and 
independent. (Adapted from U.S. Census Bureau) 


25. Men: Then and Now At 


a = 0.05, can you support 
the claim that the proportion 
of men ages 18 to 24 living 
in their parents’ homes was 
greater in 2000 than in 


Movingout == 
ofthenest > 
i? . ~ 


~_& Percentage of 18- to 24-year-olds 
_ _ living in parents’ homes in the USA: 


2009? 


26. Women: Then and Now 
At a=0.05, can you 
support the claim that the 
proportion of women ages 
18 to 24 living in their 
parents’ homes was greater 
in 2000 than in 2009? 


\ 


> Men 


® Women 


AyD 70 


56.4% 


LLLLLLLLLLLLT 


TTT 


27. Then: Men and Women 2000 
Ata = 0.01, can you reject 
the claim that the proportion of 18- to 24-year-olds living in their parents’ 
homes in 2000 was the same for men and women? 


2009 


28. Now: Men and Women At a = 0.10, can you reject the claim that the 
proportion of 18- to 24-year-olds living in their parents’ homes in 2009 was 
the same for men and women? 


M@ EXTENDING CONCEPTS 


Constructing Confidence Intervals for p;— pz You can construct a 
confidence interval for the difference between two population proportions p; — P2 
by using the following inequality. 


ek [P1d1 , P2d & .# Pid , P24 
(By = Pa) — 2, a < py — Pz < (Pi — 2) + 2% _— 
ny n2 ny n2 


In Exercises 29 and 30, construct the indicated confidence interval for p; — po. 
Assume the samples are random and independent. 


29. Students Planning to Study Education In a survey of 10,000 students 
taking the SAT, 7% were planning to study education in college. In another 
survey of 8000 students taken 10 years before, 9% were planning to study 
education in college. Construct a 95% confidence interval for p, — po, 
where p, is the proportion from the recent survey and p> is the proportion 
from the survey taken 10 years ago. (Adapted from The College Board) 


30. Students Planning to Study Health-Related Fields In a survey of 10,000 
students taking the SAT, 19% were planning to study health-related fields 
in college. In another survey of 8000 students taken 10 years before, 16% 
were planning to study health-related fields in college. Construct a 90% 
confidence interval for p; — p2, where p, is the proportion from the recent 
survey and p> is the proportion from the survey taken 10 years ago. (Adapted 
from The College Board) 
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USES AND ABUSES 


Uses 


People with arthritis 


Group 
given new 


Group 
given 
placebo 


medication 


Hypothesis Testing with Two Samples Hypothesis testing enables you to 
decide whether differences in samples indicate actual differences in populations 
or are merely due to sampling error. For instance, a study conducted on 
2 groups of 4-year-olds compared the behavior of the children who attended 
preschool with the behavior of those who stayed home with a parent. 
Aggressive behavior such as stealing toys, pushing other children, and starting 
fights was measured in both groups. The study showed that children who 
attended preschool were three times more likely to be aggressive than those 
who stayed home. These statistics were used to persuade parents to keep their 
children at home until they start school at age 5. 


Abuses 


Study Funding The study did not mention that it is normal for 4-year-olds to 
display aggressive behavior. Parents who keep their children at home but take 
them to play groups also observe their children being aggressive. Psychologists 
have suggested that this is the way children learn to interact with each other. 
The children who stayed home were less aggressive, but their behavior was 
considered abnormal. A follow-up study performed by a different group 
demonstrated that the children who stayed home before attending school 
ended up being more aggressive at a later age than those who had attended 
preschool. 

The first study was funded by a mother support group who used the 
statistics to promote their own predetermined agenda. When dealing with 
statistics, always know who is paying for a study. (Source: British Broadcasting 
Corporation) 


Using Nonrepresentative Samples In comparisons of data collected from 
two different samples, care should be taken to ensure that there are no 
confounding variables. For instance, suppose you are examining a claim that a 
new arthritis medication lessens joint pain. 

If the group that is given the medication is over 60 years old and the group 
given the placebo is under 40, variables other than the medication might affect 
the outcome of the study. When you look for other abuses in a study, consider 
how the claim in the study was determined. What were the sample sizes? Were 
the samples random? Were they independent? Was the sampling conducted by 
an unbiased researcher? 


M@ EXERCISES 


1. Using Nonrepresentative Samples Assume that you work for the Food 
and Drug Administration. A pharmaceutical company has applied for 
approval to market a new arthritis medication. The research involved a test 
group that was given the medication and another test group that was given 
a placebo. Describe some ways that the test groups might not have been 
representative of the entire population of people with arthritis. 


2. Medical research often involves blind and double-blind testing. Explain 
what these two terms mean. 
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J) CHAPTER SUMMARY 


REVIEW 


What did you learn? EXAMPLE(S) | EXERCISES 


Section 8.1 


= How to decide whether two samples are independent or dependent 1 1,2 


a How to perform a two-sample z-test for the difference between two means 2,3 3-10 
1 and 2 using large independent samples 


_ (41 — X2) — (Hi = B2) 


OX, -X, 


Section 8.2 


= How to perform a t-test for the difference between two population means 1,2 11-18 
4, and yz using small independent samples 


(X1 — X2) — (oi — He) 


SX —X 


Section 8.3 

= How to perform a t-test to test the mean of the differences for a 1,2 19-24 
population of paired data 

_ a= Ma 
sql Vn 


Two-Sample Hypothesis Testing for Population Means 


t 


Are the samples in : Are both populations N | Cannot use hypothesis tests 
independent? : : normal? E 2 discussed in this chapter. 
Yes 
Use t-test for dependent 
Yes samples (Section 8.3). 
’ Are both populations _ | Cannot use hypothesis tests 
9 No »> J 
paeboiisomplesdanee! ie) normal? | ie discussed in this chapter. 
Use z-test for large Are both standard Use t-test for small 
independent samples deviations niown? No independent samples 
(Section 8.1). : ‘ (Section 8.2). 
Yes 
Use z-test (Section 8.1). | 


Section 8.4 


= How to perform a z-test for the difference between two population 1,2 25-32 
proportions p, and p, 


(Pi — Pr) — (Pi — Pr) 
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PE REVIEW EXERCISES 


M@ SECTION 8.1 


In Exercises I and 2, classify the two given samples as independent or dependent. 
Explain your reasoning. 


1. 


2. 


Sample 1: Air pollution measurements for 15 cities 


Sample 2: Air pollution measurements for those 15 cities five years after a 
law was passed restricting carbon emissions 


Sample 1: Pulse rates of runners before a marathon 


Sample 2: Pulse rates of the same runners after a marathon 


In Exercises 3-6, use the given sample statistics to test the claim about the difference 
between two population means j1, and [12 at the given level of significance a. The 
samples are random and independent. 


3. 


Claim: pw, = pb2; a = 0.05. Sample statistics: ¥, = 1.28, 5; = 0.30, ny; = 96 
and X> = 1.34, yo 0.23, ny = 85 


. Claim: py; = 2; a = 0.01. Sample statistics: ¥, = 5595, s,; = 52, n, = 156 


and X, = 5575, s, = 68, n. = 216 

. Claim: py < po; a = 0.10. Sample statistics: X; = 0.28, s; = 0.11, ny = 41 
and X, = 0.33, s. = 0.10, ny = 34 

. Claim: pw, ~ pf; a = 0.05. Sample statistics: ¥, = 87, s; = 14, n, = 410 


and X 85, So 15, ng 340 


In Exercises 7 and 8, (a) identify the claim and state Hg and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. If convenient, use 
technology to solve the problem. In each exercise, assume the samples are randomly 
selected and independent. 


7. 


10. 


In a fast food study, a researcher finds that the mean sodium content of 
42 Wendy’s fish sandwiches is 1010 milligrams with a standard deviation of 
75 milligrams. The mean sodium content of 39 Long John Silver’s fish 
sandwiches is 1180 milligrams with a standard deviation of 90 milligrams. 
At a = 0.05, is there enough evidence for the researcher to conclude that 
the Wendy’s fish sandwich has less sodium than the Long John Silver’s fish 
sandwich? (Adapted from Wendy’s International Inc. and Long John Silver's Inc.) 


. A government agency states that the mean annual salary of civilian federal 


employees is the same in California and Illinois. The mean annual salary for 
180 civilian federal employees in California is $66,210 and the standard 
deviation is $6385. The mean annual salary for 180 civilian federal employees 
in Illinois is $67,390 and the standard deviation is $5998. At a = 0.10, is 
there enough evidence to reject the agency’s claim? (Adapted from U.S. Office 
of Personnel Management) 


. Suppose you are testing Exercise 7 at a = 0.01. Do you still make the same 


decision? Explain your reasoning. 


Suppose you are testing Exercise 8 at a = 0.05. Do you still make the same 
decision? Explain your reasoning. 
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M@ SECTION 8.2 


In Exercises 11-16, use the given sample statistics to test the claim about the 
difference between two population means js; and jz at the given level of 
significance a. Assume that the samples are random and independent and that 
the populations are approximately normally distributed. 


11. Claim: py, = pf; a = 0.05. Sample statistics: ¥; = 228, 5; = 27, n, = 20 and 
Xy = 207, sy = 25, no = 13. Assume equal variances. 

12. Claim: ; = 2; a = 0.10. Sample statistics: ¥; = 0.015, s; = 0.011, n, =8 
and xX, = 0.019, s, = 0.004, n. = 6. Assume variances are not equal. 


13. Claim: uw, = py; a = 0.05. Sample statistics: ¥, = 183.5, 5, = 1.3, n, = 25 
and X> = 184.7, s) = 3.9, ny = 25. Assume variances are not equal. 


14. Claim: pw, = py; a = 0.01. Sample statistics: ¥; = 44.5, 5; = 5.85, m, = 17 
and X7 = 49.1, s) = 5.25, ny = 18. Assume equal variances. 


15. Claim: pw, # fy; a = 0.01. Sample statistics: x, = 61, s; = 3.3, ny = 5 and 
X_ = 55, 8, = 1.2, ny = 7. Assume equal variances. 


16. Claim: pw, = py; a = 0.10. Sample statistics: ¥; = 520, 5s; = 25, n; = 7 and 
X2 = 500, s. = 55, nm. = 6. Assume variances are not equal. 


In Exercises 17 and 18, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic t, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. If convenient, use 
technology to solve the problem. In each exercise, assume the populations are 
normally distributed. 


" 17. A study of methods for teaching reading in the third grade was conducted. 

A classroom of 21 students participated in directed reading activities for 

eight weeks. Another classroom, with 23 students, followed the same 

curriculum without the activities. Students in both classrooms then took 

the same reading test. The scores of the two groups are shown in the 
back-to-back stem-and-leaf plot. 


Classroom With Activities Classroom Without Activities 

1/079 

4}21]/068 

3)}3)377 

9964333 ])4|1222368 
98776432/}5|3455 
721/61}|02 

1| 7 

8 | 5 


Key: 4|2 = 24 (classroom with activities) 


2|0 = 20 (classroom without activities) 


At a = 0.05, is there enough evidence to conclude that third graders 
taught with the directed reading activities scored higher than those 
taught without the activities? Assume the population variances are 
equal. (Source: StatLib/Schmitt, Maribeth C., The Effects of an Elaborated 
Directed Reading Activity on the Metacomprehension Skills of Third Graders) 
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18. A real estate agent claims that there is no difference between the mean 
household incomes of two neighborhoods. The mean income of 12 randomly 
selected households from the first neighborhood was $32,750 with a standard 
deviation of $1900. In the second neighborhood, 10 randomly selected 
households had a mean income of $31,200 with a standard deviation of 
$1825. At a = 0.01, can you reject the real estate agent’s claim? Assume the 
population variances are equal. 


M@ SECTION 8.3 


In Exercises 19-22, using a test for dependent, random samples, test the claim about 
the mean of the difference of the two populations at the given level of significance 
a using the given statistics. Is the test right-tailed, left-tailed, or two-tailed? Assume 
the populations are normally distributed. 


19. Claim: pg = 0; a = 0.01. Statistics: d = 8.5, 54, = 10.7,n = 16 
20. Claim: wg < 0; a = 0.10. Statistics: d = 3.2, 5g = 5.68, n = 25 
21. Claim: wg = 0; a = 0.10. Statistics: d = 10.3, sg = 18.19, n = 33 
22. Claim: xz # 0; a = 0.05. Statistics: d = 17.5, sy = 4.05, n = 37 


d 
d 


In Exercises 23 and 24, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) calculate d and s,, (d) find 
the standardized test statistic t, (e) decide whether to reject or fail to reject the null 
hypothesis, and (f) interpret the decision in the context of the original claim. 
If convenient, use technology to solve the problem. For each sample, assume the 
population is normally distributed. 


* 23. A medical researcher wants to test the effects of calctum supplements 
on men’s systolic blood pressure. In part of the study, 10 randomly 
selected men are given a calcium supplement for 12 weeks. The 
researcher measures the men’s systolic blood pressure (in millimeters of 
mercury) before and after the 12-week study and records the results 
shown below. At a = 0.10, can the researcher claim that the men’s 
systolic blood pressure decreased? (Source: The Journal of the American 
Medical Association) 


24. A physical fitness instructor claims that a particular weight loss supplement 
will help users lose weight after two weeks. The table shows the weights (in 
pounds) of 9 mildly overweight adults before using the supplement and two 
weeks after using the supplement. At a = 0.05, can you conclude that the 
supplement helps users lose weight? 


1 2 3 4 5 6 7 8 9 


228 210 245 272 203 198 256 217 240 


225 208 =242 270 205 196 250 220 = 240 
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HYPOTHESIS TESTING WITH TWO SAMPLES 


M@ SECTION 8.4 


In Exercises 25-28, decide whether the normal sampling distribution can be used. 
If it can be used, test the claim about the difference between two population 
proportions p; and pz at the given level of significance a using the given sample 
statistics. Assume the sample statistics are from independent, random samples. 


. Claim: p; = po; a = 0.05. Sample statistics: x, = 425, n, = 840 and 


x2 > 410, Ny = 760 


. Claim: p; = p23; a= 0.01. Sample statistics: x; = 36, n, = 100 and 


ay = 46, ny = 200 


. Claim: p; > p23; a = 0.10. Sample statistics: x, = 261, n, = 556 and 


xX) = 207, ny = 483 


. Claim: p, < p23; a = 0.05. Sample statistics: x; = 86, n, = 900 and 


x) = 107, ny = 1200 


In Exercises 29 and 30, (a) identify the claim and state Hy and H,, (b) find the 
critical value(s) and identify the rejection region(s), (c) find the standardized test 
statistic z, (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. If convenient, use 
technology to solve the problem. In each exercise, assume the samples are randomly 
selected and that the samples are independent. 


29. 


In a survey of 900 U.S. adults in 2008, 468 considered the amount of federal 
income tax they had to pay to be too high. In a recent year, in a survey of 
1027 U.S. adults, 472 considered the amount too high. At a = 0.01, can you 
reject the claim that the proportions of U.S. adults who considered the 
amount of federal income tax they had to pay to be too high were the same 
for the two years? (Adapted from The Gallup Poll) 


Study Done in 2008 Study Done Recently 
(900 U.S. Adults) (1027 U.S. Adults) 
Yes 
468 


. In asurvey of 1000 USS. adults in 2007, 57% said it is likely that life exists on 


other planets. In a recent year, in a survey of 1000 US. adults, 53% said it is 
likely that life exists on other planets. At a = 0.05, can you support the claim 
that the proportion of U.S. adults who believe it is likely that life exists on 
other planets is less now than in 2007? (Source: Rasmussen Reports) 


Survey Done in 2007 Survey Done Recently 
(1000 U.S. Adults) (1000 U.S. Adults) 
No No 
43% 47% 


. Suppose you are testing Exercise 29 at a = 0.05. Do you still make the same 


decision? Explain your reasoning. 


. Suppose you are testing Exercise 30 at a = 0.01. Do you still make the same 


decision? Explain your reasoning. 
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Take this quiz as you would take a quiz in class. After you are done, check 
your work against the answers given in the back of the book. 


For this quiz, do the following. 


(a) 


Write the claim mathematically and identify Ho and H,. 


(b) Determine whether the hypothesis test is a one-tailed test or a two-tailed test 


and whether to use a Z-test or a t-test. Explain your reasoning. 


Find the critical value(s) and identify the rejection region(s). 


(d) Use the appropriate test to find the appropriate standardized test statistic. If 


(e) 
) 
1 


convenient, use technology. 
Decide whether to reject or fail to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 


The mean score on a science assessment for 49 randomly selected male high 
school students was 149 with a standard deviation of 35. The mean score on 
the same test for 50 randomly selected female high school students was 145 
with a standard deviation of 33. At a = 0.05, can you support the claim that 
the mean score on the science assessment for the male high school students 
was higher than for the female high school students? (Adapted from National 
Center for Education Statistics) 


. A science teacher claims that the mean scores on a science assessment test 


for fourth grade boys and girls are equal. The mean score for 13 randomly 
selected boys is 153 with a standard deviation of 32, and the mean score 
for 15 randomly selected girls is 149 with a standard deviation of 30. At 
a = 0.01, can you reject the teacher’s claim? Assume the populations are 
normally distributed and the variances are equal. (Adapted from National Center 
for Education Statistics) 


. In a random sample of 800 US. adults, 336 are worried that they or someone 


in their family will become a victim of terrorism. In another random sample 
of 1100 US. adults taken a month earlier, 429 were worried that they or 
someone in their family would become a victim of terrorism. At a = 0.10, can 
you reject the claim that the proportion of U.S. adults who are worried that 
they or someone in their family will become a victim of terrorism has not 
changed? (Adapted from The Gallup Poll) 


4. The table shows the credit scores for 12 randomly selected adults who are 
considered high-risk borrowers before and two years after they attend 
a personal finance seminar. At a = 0.01, is there enough evidence to 
conclude that the seminar helps adults increase their credit scores? 


602 644 656 632 664 
650 660 650 680 702 
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> Pgs Real Statistics — Real Decisions 


The National Hospital Discharge Survey (NHDS) is a national 
probability survey that has been conducted annually since 1965 by the 
Centers for Disease Control and Prevention’s National Center for 
Health Statistics. From 1988 to 2007, the NHDS collected data from a 
sample of about 270,000 inpatient records provided by a national 
sample of about 500 hospitals. Beginning in 2007, this sample size 
was reduced to 239 hospitals. Only non-Federal short-stay hospitals, 


such as general hospitals and children’s general hospitals, are included Inpatients Length of Stay 


in the survey. The results of this survey provide information on the f (Beenie seo) 
characteristics of inpatients discharged from these hospitals and are aA 
used to examine important topics of interest in public health. , 6+ ¥, ~ 5.38 
You work for the National Center for Health Statistics. You want 5 : 5, = 1.65 
to test the claim that the mean length of stay for inpatients today is & 31])n, = 26 
different than what it was a decade ago by analyzing data from a = ral 
| 


random selection of inpatient records. The results for several inpatients 


{ 
T T 
are shown in the histograms from a decade ago and the current year. U2 eon Ol alec 
Length of stay (in days) 


1. How Could You Do It? 


Explain how you could use the given sampling technique to select Inpatients Length of Stay 


(Current Year) 


the sample for the study. f 

(a) stratified sample (b) cluster sample 13+ 

(c) systematic sample (d) simple random sample a 72 sss 
2. Choosing a Sampling Technique 5 A . = 28 

(a) Which sampling technique in Exercise 1 would you choose to g 5+ 

implement for the study? Why? 3+ 

(b) Identify possible flaws or biases in your study. a : _ 

3. Choosing a Test ee C day bn a : 


To test the claim that there is a difference in the mean length of 
hospital stays, should you use a z-test or a test? Are the samples 
independent or dependent? Do you need to know what each 
population’s distribution is? Do you need to know anything about 
the population variances? 


4. Testing a Mean 


Test the claim that there is a difference in the mean length of 
hospital stays for inpatients. Assume the populations are normal and 
the population variances are not equal. Use a = 0.10. Interpret the 
test’s decision. Does the decision support the claim? 
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TECHNOLOGY MINITAB TI-83/84 PLUS 


TAILS OVER HEADS 


In the article “Tails over Heads” in the Washington Coin Toss Simulation 


Post (Oct. 13, 1996), journalist William Casey describes ; 

one of his hobbies—keeping track of every coin he a 

finds on the street! From January 1, 1985 until the 70 — 
article was written, Casey found 11,902 coins. 60 


As each coin is found, Casey records the time, date, 
location, value, mint location, and whether the coin is 
lying heads up or tails up. In the article, Casey notes 
that 6130 coins were found tails up and 5772 were 
found heads up. Of the 11,902 coins found, 43 were 
minted in San Francisco, 7133 were minted in 


Frequency 
aS 
Co 


Philadelphia, and 4726 were minted in Denver. ays a SSSSS990500085 — cit 
A simulation of Casey’s experiment can be done in WHA BGAADARARESSSSSS 
MINITAB as shown below. A frequency histogram of Number of heads 
one simulation’s results is shown at the right. 
feslya From Columns... | MINITAB 
Chi-Square... 
Normal... ; 
WiuliveaniareiNocraalt: Number of rows of data to generate: [500 
= Store in column{s): |C1 
Uniform. Number of trials: 11902 
Bernoulli... 
Event probability: |.5 
Geometric... 
Mi EXERCISES 
1. Use a technology tool to perform a one-sample In Exercises 4 and 5, use a technology tool to 
z-test to test the hypothesis that the perform a two-sample z-test to decide whether 
probability that a “found coin” will be lying there is a difference in the mint dates and in the 
heads up is 0.5. Use a = 0.01. Use Casey’s values of coins found on a street from 1985 
data as your sample and write your conclusion through 1996. Write your conclusion as a sentence. 
as a sentence. Use a = 0.05. 
2. Do Casey’s data differ significantly from 4. Mint dates of coins (years) 
9p i ? : oe 
chance? If so, what might be the reason? Philadelphia: x; = 1984.8 5; = 8.6 
3. In the simulation shown above, what percent Denver: He = 19824 sn 64 


of the trials had heads less than or equal to the : 
: : 5. Value of coins (dollars) 

number of heads in Casey’s data? Use a . eg _ 

technology tool to repeat the simulation. Are Philadelphia: x; = $0.034 5, = $0.054 

your results comparable? Denver: X2 = $0.033 sp = $0.052 


Extended solutions are given in the Technology Supplement. Technical 
instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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USING TECHNOLOGY TO PERFORM TWO-SAMPLE 
HYPOTHESIS TESTS 


Here are some MINITAB and TI-83/84 Plus printouts for several examples 
in this chapter. 


(See Example 1, page 444.) 


Display Descriptive Statistics... MINITAB 


Store Descriptive Statistics... 


Graphical Summary... Two-Sample T-Test and Cl 

4-Sample Z... eae ‘ a gies Sle ae 
Aes lul Ae 2 18 459.0 24.5 5.8 
2-Sample t... 

Paired t... Difference = mu (1) — mu (2) 


Estimate for difference: 14.0 
90% Cl for difference: (—13.8, 41.8) 
i-Test of difference = O)(vs not =): T-Value = O92 PValue = 0380 DF = 9 


1 Proportion... 
2 Proportions... 


(See Example 1, page 453.) 


Vertical Jump Heights, Before and After Using Shoes 


25 | 29 | 33 | 34 | 35 | 30 


Display Descriptive Statistics... MINITAB 

Store Descriptive Statistics... , 

Graphical Summary... Paired T-Test and Cl: C1, C2 

1-Sample Z... Paired T for C1 — C2 

41-Sample t... N Mean StDev SE Mean 
2-Sample t... C1 8 27.88 4.32 dias 
ce 8 29.63 4.07 1.44 
1 Proportion... Difference 8 =" 7/30) 21124 O50 
2 Proportions... 90% upper bound for mean difference: —0.689 


T-Test of mean difference = O [vs < QO): T-Value = —2.33 P-Value = 0.026 
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(See Example 2, page 432.) 


TI-83/84 PLUS 


EDIT CALC (agSeyrs) 


(eZ oeste 

ee T= Wels. 
2-SampZTest... 
4: 2-SampT Test... 
5: 1-PropZTest... 
6: 2-PropZTest... 
7\Zinterval... 


TI-83/84 PLUS 


2-SampZTest 


Inpt: Data 


a1: 1045.7 

a2: 1361.95 

X1: 4446.25 

fale 2S{o) 

x2: 4567.24 
Jn2: 250 


TI-83/84 PLUS 


2-SampZTest 
To2: 1361.95 
x1: 4446.25 
mile 2430) 
toh tele VAtar4s 
(m2: e2ta{0) 


u1: BREA < ve >ye 


Calculate Draw 


TI-83/84 PLUS 


2-SampZTest 
u1 4 ye 
z= —1.114106102 
p= .2652337599 
X1= 4446.25 
X2= 4567.24 
tn1= 250 
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USING TECHNOLOGY TO PERFORM TWO-SAMPLE HYPOTHESIS TESTS 


(See Example 2, page 445.) 


TI-83/84 PLUS 


EDIT CALC (aSsyrs) 
le ZoVESb 
weste 
3: 2-SampZTest... 
2-SampT Test... 
5: 1-PropZTest... 
6: 2-PropZTest... 
7\Zinterval... 


TI-83/84 PLUS 


2-SampT Test 


Inpt: Data 


ms 127s 
Sxdieao 
n1: 14 
yee 120) 
Sxeso0) 
Jn2: 16 


TI-83/84 PLUS 


2-SamptT Test 

Tn1: 14 

x2: 1250 

Sx2; 30) 

mes “ie 

u1: Aue <ye2 
Pooled: No PGs 
Calculate Draw 


TI-83/84 PLUS 


2-Samptl Test 
u1 > ye 
i= 115)() Wiatsts)1) 
p= .0404131295 


df= 28 
X1= 1275 
Lx2= 1250 
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(See Example 1, page 463.) 


TI-83/84 PLUS 


EDIT CALC (assure) 


ls ZoVESe. 
eects 
3: 2-SampZTest... 
4: 2-SamptT Test... 
5: 1-PropZTest... 
2-PropZilest.. 
7JZinterval... 


| 


TI-83/84 PLUS 


2-PropZTest 

xls W223) 

mle “af0) 

x2: 148 

ne: 200 

pi: <p2 >p2 


Calculate Draw 


| 


TI-83/84 PLUS 


2-PropZTest 
p1 # p2 
z= 2.734478928 
= .0062480166 


Lg= .7914285714 
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1. Inasurvey of 1000 people who attend community college, 13% are age 40 or 
older. (Adapted from American Association of Community Colleges) 


(a) Construct a 95% confidence interval for the proportion of people who 
attend community college that are age 40 or older. 

(b) A researcher claims that more than 10% of people who attend 
community college are age 40 or older. At a = 0.05, can you support 
the researcher’s claim? Interpret the decision in the context of the 
original claim. 


2. Gas Mileage The table shows the gas mileages (in miles per gallon) of eight 
cars with and without using a fuel additive. At a = 0.10, is there enough 
evidence to conclude that the additive improved gas mileage? 


Car 1 2 3 4 5 6 7 8 


Gas mileage 
whihent additive 23.1 | 25.4 | 21.9 | 24.3 | 19.9 | 21.2 | 25.9 | 24.8 


Sasmaleaee 43.6 | 27.7 | 23.6 | 26.8 | 22.1 | 22.4 | 263 | 266 
fuel additive 


In Exercises 3-6, construct the indicated confidence interval for the population 
mean yw. Which distribution did you use to create the confidence interval? 


3. c = 0.95, xX = 26.97, s = 3.4,n = 42 
4. c = 0.90, x = 3.46, 5 = 1.63,n = 16 
5. c = 0.99, x = 12.1, 5 = 2.64,n = 26 
6. c = 0.95, x = 8.21, 5 = 0.62,n = 8 


7. A pediatrician claims that the mean birth weight of a single-birth baby is 
greater than the mean birth weight of a baby that has a twin. The mean birth 
weight of a random sample of 85 single-birth babies is 3086 grams with a 
standard deviation of 563 grams. The mean birth weight of a random sample 
of 68 babies that have a twin is 2263 grams with a standard deviation of 624 
grams. At a = 0.10, can you support the pediatrician’s claim? Interpret the 
decision in the context of the original claim. 


In Exercises 8-11, use the given statement to represent a claim. Write its complement 
and state which is Hg and which is H,. 


8. ww < 33 9. p = 0.19 
10. o = 0.63 11. pp 4 2.28 


12. 


13. 


16. 


The mean number of chronic medications taken by a random sample of 
26 elderly adults in a community has a sample standard deviation of 3.1 
medications. Assume the population is normally distributed. (Adapted from 
The Journal of the American Medical Association) 


(a) Construct a 99% confidence interval for the population variance. 


(b) Construct a 99% confidence interval for the population standard 
deviation. 


(c) A pharmacist believes that the standard deviation of the mean number 
of chronic medications taken by elderly adults in the community is less 
than 2.5 medications. At a = 0.01, can you support the pharmacist’s 
claim? Interpret the decision in the context of the original claim. 


An education organization claims that the mean SAT scores for male 
athletes and male non-athletes at a college are different. A random sample 
of 26 male athletes at the college has a mean SAT score of 1783 and a 
standard deviation of 218. A random sample of 18 male non-athletes at the 
college has a mean SAT score of 2064 and a standard deviation of 186. At 
a = 0.05, can you support the organization’s claim? Interpret the decision in 
the context of the original claim. Assume the populations are normally 
distributed and the population variances are equal. 


_ 14. The annual earnings for 26 randomly selected translators are shown 


below. Assume the population is normally distributed. (Adapted from 
U.S. Bureau of Labor Statistics) 


39,023 36,340 40,517 43,351 43,136 44,504 33,873 39,204 
42,853 36,864 37,952 35,207 34,777 37,163 37,724 34,033 
38,288 38,738 40,217 38,844 38,949 38,831 43,533 39,613 
39,336 38,438 


(a) Construct a 95% confidence interval for the population mean 
annual earnings for translators. 


(b) A researcher claims that the mean annual earnings for translators 
is $40,000. At a = 0.05, can you reject the researcher’s claim? 
Interpret the decision in the context of the original claim. 


. A medical research team studied the number of head and neck injuries 


sustained by hockey players. Of the 319 players who wore a full-face shield, 
195 sustained an injury. Of the 323 players who wore a half-face shield, 204 
sustained an injury. At a = 0.10, can you reject the claim that the proportions 
of players sustaining head and neck injuries are the same for the two groups? 
Interpret the decision in the context of the original claim. (Source: The 
Journal of the American Medical Association) 


A random sample of 40 ostrich eggs has a mean incubation period of 42 days 
and a standard deviation of 1.6 days. 


(a) Construct a 95% confidence interval for the population mean incubation 
period. 

(b) A zoologist claims that the mean incubation period for ostriches is at 
least 45 days. At a = 0.05, can you reject the zoologist’s claim? Interpret 
the decision in the context of the original claim. 
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CORRELATION 
AND REGRESSION 


9.1 Correlation 
@ ACTIVITY 

9.2 Linear Regression 
@ ACTIVITY 


@ CASE STUDY 
9.3 Measures of 
Regression and 
Prediction Intervals 
9.4 Multiple Regression 
m@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


@ TECHNOLOGY 


In 2009, the New York Yankees had the 
highest team salary in Major League Baseball 
at $201.4 million and the Florida Marlins had 
the lowest team salary at $36.8 million. In the 
same year, the Los Angeles Dodgers had the 
highest average attendance at 46,440 and the 
Oakland Athletics had the lowest average 
attendance at 17,392. 
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«€ WHERE YOU'VE BEEN 


In Chapters 1-8, you studied descriptive statistics, 
probability, and inferential statistics. One of the 
techniques you learned in descriptive statistics was 
graphing paired data with a scatter plot (Section 
2.2). For instance, the salaries and average 
attendances at home games for the teams in Major 
League Baseball in 2009 are shown in graphical 
form at the right and in tabular form below. 


73.55 96.7 


67.1 121.7) 134.8 96.1 | 73.6 81.6 
36.8 103.0 70.5 


113.7 


113.0 48.7 | 43.7 


Major League Baseball 


55,000 = 
50,000 = 
45,000 - 
40,000 ~ 
35,000 = 
30,000 = 
25,000 
20,000 = 
15,000 - 


%, ¢ 
2 Fe 


Average attendance 
per home game 


i 
25 50 75 100 125 150 175 200 225 
Salary (in millions of dollars) 


75.2 | 115.1 


26,281 | 29,304 23,545 37,811 | 39,610 | 28,199 | 21,579 22,492 32,902 | 31,693 


100.4 80.2 65.3 1494 201.4 62.3 


18,770 31,124 22,473 40,004 46,440 37,499 29,466 38,941 45,364 17,392 


82.6 989 885 63.3 | 68.2 80.5 60.3 


44,453 19,479 23,735 35,322 | 27,116 | 41,274 | 23,147 27,641 23,162 | 22,715 


WHERE YOU’RE GOING p> 


In this chapter, you will study how to describe and 
test the significance of relationships between two 
variables when data are presented as ordered 
pairs. For instance, in the scatter plot above, it 
appears that higher team salaries tend to 
correspond to higher average attendances and 
lower team salaries tend to correspond to lower 
This 
described by saying that the team salaries are 


average attendances. relationship is 


Major League Baseball 


Average attendance 
per home game 
Ww 
i=) 
S 
Ss 
i 


SS Se 
25 50 75 100 125 150 175 200 225 
Salary (in millions of dollars) 


positively correlated to the average attendances. 
Graphically, the relationship can be described by 
drawing a line, called a regression line, that fits 
the points as closely as possible, as shown below. 
The second scatter plot below shows the salaries 
and wins for the teams in Major League Baseball 
in 2009. From the scatter plot, it appears that there 
is a weak positive correlation between the team 
salaries and wins. 


A Major League Baseball 
A 
110 +- 
e 
100 a: 
a | See ss 
e 
= 80> eee 
= ali ° si3$ e n 
ee 
60 -- ce 
50-4 


+ 
25m Oe mL OO M255 OMI 7592.00) 
Salary (in millions of dollars) 
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484 


CHAPTER 9 


Correlation 


WHAT YOU SHOULD LEARN 


» 


Vv 


wv 


An introduction to linear 
correlation, independent and 
dependent variables, and the 
types of correlation 


How to find a correlation 
coefficient 


How to test a population 
correlation coefficient p using 
a table 


How to perform a hypothesis 
test for a population 
correlation coefficient p 


How to distinguish between 
correlation and causation 
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CORRELATION AND REGRESSION 


An Overview of Correlation > Correlation Coefficient » Using a Table to 
Test a Population Correlation Coefficient p >» Hypothesis Testing for 
a Population Correlation Coefficient p >» Correlation and Causation 


>» AN OVERVIEW OF CORRELATION 


Suppose a safety inspector wants to determine whether a relationship exists 
between the number of hours of training for an employee and the number of 
accidents involving that employee. Or suppose a psychologist wants to know 
whether a relationship exists between the number of hours a person sleeps each 
night and that person’s reaction time. How would he or she determine if any 
relationship exists? 

In this section, you will study how to describe what type of relationship, or 
correlation, exists between two quantitative variables and how to determine 
whether the correlation is significant. 


DEFINITION 


A correlation is a relationship between two variables. The data can be 
represented by the ordered pairs (x, y), where x is the independent (or 
explanatory) variable and y is the dependent (or response) variable. 


In Section 2.2, you learned that the graph of ordered pairs (x, y) is called a 
scatter plot. In a scatter plot, the ordered pairs (x, y) are graphed as points in a 
coordinate plane. The independent (explanatory) variable x is measured by the 
horizontal axis, and the dependent (response) variable y is measured by the 
vertical axis. A scatter plot can be used to determine whether a linear (straight 
line) correlation exists between two variables. The following scatter plots show 
several types of correlation. 


As x increases, y 
y tends to 
decrease. 


As x increases, 
y tends to 
increase. 


ry . ry ; 
Ce, e ; e 22%." 
e e 
e 2 e 2¢ = e ee 
se . ee P ¥ a <r, 
e ‘ e ° . e Se, 
Se. ee 7 . °. 
+ CI % >X 


No Correlation Nonlinear Correlation 
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EXAMPLE 1 


> Constructing a Scatter Plot 


An economist wants to determine whether there is a linear relationship 
between a country’s gross domestic product (GDP) and carbon dioxide (CO) 
1.6 4282 emissions. The data are shown in the table at the left. Display the data in a 
scatter plot and determine whether there appears to be a positive or negative 


3.6 828.8 , : ; : 
linear correlation or no linear correlation. (Source: World Bank and U.S. Energy 
me 1214.2 Information Administration) 
11 444.6 ; 
09 51H > Solution 
j y 
29 415.3 The scatter plot is shown at the e ‘j 
right. From the scatter plot, it Sane: 
2.7 571.8 : +48 a2 
appears that there is a positive 22 ° 
2.3 454.9 linear correlation between the % 2 1000+ 
1.6 358.7 variables. 5° 7 
ae 600-- e ° 
1.5 = Interpretation Reading from left § = i a. 
to right, as the gross domestic i ae . . =. 
products increase, the carbon S Hl 4 3 4 5 
dioxide emissions tend to increase. GDP (in trillions of dollars) 


> Try It Yourself 1 


A director of alumni affairs at a small college wants to determine whether 
there is a linear relationship between the number of years alumni classes have 
1 12.5 been out of school and their annual contributions (in thousands of dollars). 
The data are shown in the table at the left. Display the data in a scatter plot 


7 a and determine the type of correlation. 
4, 

15 52 a. Draw and label the x- and y-axes. 

3 i b. Plot each ordered pair. 


c. Does there appear to be a linear correlation? If so, interpret the correlation 
24 3.1 in the context of the data. Answer: Page A44 


30 2.7 
EXAMPLE 2 


» Constructing a Scatter Plot 


A student conducts a study to determine whether there is a linear relationship 
between the number of hours a student exercises each week and the student’s 
grade point average (GPA). The data are shown in the following table. Display 
the data in a scatter plot and describe the type of correlation. 


3.6 40 3.925 24 2.2 3.7) 30/18 3.1 


> Solution 


The scatter plot is shown at the right. 
From the scatter plot, it appears 
that there is no linear correlation 
between the variables. 


Grade point average 
NR 
Oo 
i 
T 


Interpretation The number of 
hours a student exercises each week 
does not appear to be related to the 
student’s grade point average. 


p++ +++ +> 
2 4 6 8 10 12 14 16 18 20 


Hours of exercise 
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486 CHAPTER 9 


STUDY TIP 


Save any data put into 
a technology tool 
because these data 
will be used throughout 
the chapter. 


1.80 56 3.78 719 
1.82 58 3.83 85 
1.90 62 3.88 80 
1.93 56 4.10 89 
1.98 57 4.27 90 
2.05 57 4.30 89 
213 60 4.43 89 
2.30 57 4.47 86 
2.37 61 4.53 89 
2.82 73 4.55 86 
3.13 76 4.60 92 
3.27 77 4.63 91 
3.65 77 

STUDY TIP 

You can also use ¢ 


MINITAB and Excel to 
construct scatter plots. 


Presented by: https://jafrilibrary.org 


CORRELATION AND REGRESSION 


> Try It Yourself 2 


A researcher conducts a study to determine whether there is a linear 
relationship between a person’s height (in inches) and pulse rate (in beats 
per minute). The data are shown in the following table. Display the data in a 
scatter plot and describe the type of correlation. 


Height, 68 72 65 > 70 62 75 78 64 68 


90 85 88 100 105 98 70 65 72 


a. Draw and label the x- and y-axes. 

b. Plot each ordered pair. 

c. Does there appear to be a linear correlation? If so, interpret the correlation 
in the context of the data. Answer: Page A44 


EXAMPLE 3 G@® Report 37 


> Constructing a Scatter Plot Using Technology 


Old Faithful, located in Yellowstone National Park, is the world’s most famous 
geyser. The durations (in minutes) of several of Old Faithful’s eruptions and 
the times (in minutes) until the next eruption are shown in the table at the left. 
Using a TI-83/84 Plus, display the data in a scatter plot. Describe the type of 
correlation. 


> Solution 

Begin by entering the x-values into List 1 and the y-values into List 2. Use Stat 
Plot to construct the scatter plot. The plot should look similar to the one shown 
below. From the scatter plot, it appears that the variables have a positive 
linear correlation. 


100 


Flake Plot 
Pork 
YPer BL dh 
HH-- HIH |" 


Mh HOH |=" 
mlisteli 


Wlistilez 
Mark: B+ 


Interpretation You can conclude that the longer the duration of the 
eruption, the longer the time before the next eruption begins. 


> Try It Yourself 3 


Consider the data from the Chapter Opener on page 483 on the salaries and 
average attendances at home games for the teams in Major League Baseball. 
Use a technology tool to display the data in a scatter plot. Describe the type of 
correlation. 


a. Enter the data into List 1 and List 2. 

b. Construct the scatter plot. 

c. Does there appear to be a linear correlation? If so, interpret the correlation 
in the context of the data. Answer: Page A44 
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INSIGHT 


The formal name for 


r is the Pearson product 


moment correlation 


coefficient. It is named 


after the English 


statistician Karl Pearson 


(1857-1936). 
(See page 33.) 


Presented by: https://jafrilibrary.org 


SECTION 9.1 CORRELATION 487 


> CORRELATION COEFFICIENT 


Interpreting correlation using a scatter plot can be subjective. A more precise 
way to measure the type and strength of a linear correlation between two 
variables is to calculate the correlation coefficient. Although a formula for the 
sample correlation coefficient is given, it is more convenient to use a technology 
tool to calculate this value. 


DEFINITION 


The correlation coefficient is a measure of the strength and the direction of a 
linear relationship between two variables. The symbol r represents the sample 


correlation coefficient. A formula for r is 
y ni xy — (Xx)(DVy) 
ee 
Vand x2 — (Sx? Vandy — (Sy)? 


where n is the number of pairs of data. 


The population correlation coefficient is represented by p (the lowercase 


Greek letter rho, pronounced “row”). 


The range of the correlation coefficient is —1 to 1, inclusive. If x and y have 


a strong positive linear correlation, r is close to 1. If x and y have a strong 
negative linear correlation, r is close to —1. If x and y have perfect positive 
linear correlation or perfect negative linear correlation, r is equal to 1 or —1, 
respectively. If there is no linear correlation or a weak linear correlation, r is close 
to 0. It is important to remember that if r is close to 0, it does not mean that there 
is no relation between x and y, just that there is no linear relation. Several 


examples are shown below. 


y y y 

A A A 
= ot Bt . = > 160+ % 
= + <2 Es 140+ ° 
B50 ° © e BS ° ° 
= 40+ ° Nous ° 23 120+ —e 
& e © 10+ ere 5. 10+ —° = 
2 30+ . ° eo ° a ve 
Z Bots zg pet 
5S 20-++ * gl e 5 > 60+ . 
= e e Ow 
2 10+ = 3 ce 
i 2 7+ Ba 40+ 

x x x 
12345678 60 62 64 66 68 70 72 1020 3040506070 


Number of adult ; ie 
movie tickets Height (in inches) 


Perfect positive correlation Strong positive correlation 


Income per year 
(in thousands of dollars) 


Weak positive correlation 


r=1 r= 0.81 r= 0.45 
y y y 
A A A 
Ps eer 100 eg. 3 a 
5 90+ a » 90-+-e ; q 707 * 
9 ° ae] e e s 68+ 2 
2 go + *. & 80--—e§ cS : 
g bb ce & 66+ e 
= 70+ 3 70+ a) ad 
sa} b s 2 64+ e 
60 + 60+ ° 2 691 e e 
e a 62 = Fy Fy 
50+ 50+ ae 60 + e e 
t+t+t+}++>% H+t+++4++>x Wie 
12345678 12345678 98 102 106 
Number incorrect Number of absences IQ score 
Perfect negative correlation Strong negative correlation No correlation 
r=-1 r= —0.92 r= 0.04 
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STUDY TIP 


> x? means square 
each value and add 
the squares. (> x)? 
means add the values 
and square the sum. 


STUDY TIP 


Notice that the 
correlation coefficient 
rin Example 4 is 
rounded to three 
decimal places. This 
round-off rule will 

be used throughout 
the text. 
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CORRELATION AND REGRESSION 


GUIDELINES 


Calculating a Correlation Coefficient 


IN WORDS IN SYMBOLS 
1. Find the sum of the x-values. Des 
2. Find the sum of the y-values. yy 
3. Multiply each x-value by its corresponding Pay 
y-value and find the sum. 
4. Square each x-value and find the sum. Dx 
5. Square each y-value and find the sum. Da 
6. Use these five sums to calcu- r= ez ea eae) 
late the correlation coefficient. Ve = (Sx)? Vndy Ope 


EXAMPLE 4 


> Finding the Correlation Coefficient 


Calculate the correlation coefficient for the gross domestic products and 
carbon dioxide emissions data given in Example 1. What can you conclude? 


> Solution Use a table to help calculate the correlation coefficient. 


428.2 685.12 2.56 183,355.24 
3.6 828.8 2983.68 12.96 686,909.44 
4.9 1214.2 5949.58 24.01 1,474,281.64 
1.1 444.6 489.06 1.21 197,669.16 
0.9 264.0 237.6 0.81 69,696 
2.9 415.3 1204.37 8.41 172,474.09 
2.7 571.8 1543.86 7.29 326,955.24 
2.3 454.9 1046.27 5.29 206,934.01 
1.6 358.7 573.92 2.56 128,665.69 
15 573.5 860.25 225 328,902.25 

Dx = 23.1 Dy = 5554 Vxy = 15,573.71 Dx? = 67.35 | Sy? = 3,775,842.76 


With these sums and n = 10, the correlation coefficient is 
n& xy — (2x)(y) 
Vind x - (> xrVn> y= yy 
7 10(15,573.71) — (23.1)(5554) 
V/10(67.35) — 23.12V/10(3,775,842.76) — 5554? 
_ 27,439.7 2 
V139.89V6911,511.6 


7 The result r ~ 0.882 suggests a strong positive linear correlation. 


r= 


882. 


Interpretation As the gross domestic product increases, the carbon dioxide 
emissions also increase. 
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> Try It Yourself 4 


Calculate the correlation coefficient for the number of years out of school and 
annual contribution data given in Try It Yourself 1. What can you conclude? 


: a a. Identify n and use a table to help calculate © x, Sy, Sxy, } x’, and > y’. 
10 8.7 b. Use the resulting sums and n to calculate r. 

5 14.6 c. What can you conclude? Answer: Page A44 
15 5.2 


24 3.1 EXAMPLE 5 [MSC Soeeeaier 


» Using Technology to Find a Correlation Coefficient 


Use a technology tool to calculate the correlation coefficient for the Old 
Faithful data given in Example 3. What can you conclude? 


> Solution 


MINITAB, Excel, and the TI-83/84 Plus each have features that allow you to 
calculate a correlation coefficient for paired data sets. Try using this technology 
to find r. You should obtain results similar to the following. 


MINITAB 


Correlations: C1, C2 
Pearson correlation of C1 and Ce = 0.979 


Cc 


25) 
0.978659 


Before using the TI-83/84 Plus to calculate r, you must enter the Diagnostic On 
command. To do so, enter the following keystrokes: 


), To explore this topic further, [0] cursor to DiagnosticOn [ENTER | ENTER |, 


~ see Activity 9.1 on page 500. 


The following screens show how to find r using a TI-83/84 Plus with the data 
stored in List 1 and List 2. To begin, use the STAT keystroke. 


TI-83/84 PLUS TI-83/84 PLUS TI-83/84 PLUS 


LinReg[ax+b) L1, LinReg 
EDIT feNweg TESTS Le y=axtb 
1: 1-Var Stats a=12.48094391 
2: 2-Var Stats b=33.68290034 
3: Med-Med r°=.9577738551 


h=o786092 129 


LinReg{ax+b] 
5: QuadReg 
6: CubicReg 


71 QuartReg Correlation coefficient 


The result r ~ 0.979 suggests a strong positive linear correlation. 


> Try It Yourself 5 

Calculate the correlation coefficient for the data from the Chapter Opener on 
page 483 on the salaries and average attendances at home games for the teams 
in Major League Baseball. What can you conclude? 


a. Enter the data. 
b. Use the appropriate feature to calculate r. 
c. What can you conclude? Answer: Page A44 
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490 = CHAPTER 9 


STUDY TIP 


The level of significance 
is denoted by a, the 
lowercase Greek 
letter alpha. 


STUDY TIP 

If you determine that the linear 
correlation is significant, then 
you will be able to proceed to 


write the equation for the line that 


best describes the data. This line, 
called the regression line, can be 
used to predict the value 


of y when given a value 
of x. You will learn how a 


to write this equation 
in the next section. 
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CORRELATION AND REGRESSION 


>» USING A TABLE TO TEST A POPULATION 
CORRELATION COEFFICIENT p 


Once you have calculated r, the sample correlation coefficient, you will want to 
determine whether there is enough evidence to decide that the population 
correlation coefficient p is significant. In other words, based on a few pairs of 
data, can you make an inference about the population of all such data pairs? 
Remember that you are using sample data to make a decision about population 
data, so it is always possible that your inference may be wrong. In correlation 
studies, the small percentage of times when you decide that the correlation is 
significant when it is really not is called the level of significance. It is typically set 
at a = 0.01 or 0.05.When @ = 0.05, you will probably decide that the population 
correlation coefficient is significant when it is really not 5% of the time. 
(Of course, 95% of the time, you will correctly determine that a correlation 
coefficient is significant.) When a = 0.01, you will make this type of error only 
1% of the time. When using a lower level of significance, however, you may fail 
to identify some significant correlations. 

In order for a correlation coefficient to be significant, its absolute value must 
be close to 1. To determine whether the population correlation coefficient p is 
significant, use the critical values given in Table 11 in Appendix B. A portion of 
the table is shown below. If |r| is greater than the critical value, there is enough 
evidence to decide that the correlation is significant. Otherwise, there is not 
enough evidence to say that the correlation is significant. For instance, to 
determine whether p is significant for five pairs of data (n = 5) at a level of 
significance of a = 0.01, you need to compare |r| with a critical value of 0.959, as 


shown in the table. 
Critical values for 
 « = 0.05 and a= 0.01 
| 


Number 7 of pairs 
of data in sample 
n 


6 0.811 0.917 


If |r| > 0.959, the correlation is significant. Otherwise, there is not enough 
evidence to conclude that the correlation is significant. The guidelines for this 
process are as follows. 


GUIDELINES 


Using Table 11 for the Correlation Coefficient p 
IN WORDS IN SYMBOLS 


1. Determine the number of pairs Determine n. 
of data in the sample. 


2. Specify the level of significance. Identify a. 


Use Table 11 in Appendix B. 


If |r| > critical value, the 
correlation is significant. 
Otherwise, there is not enough 
evidence to conclude that the 
correlation is significant. 


3. Find the critical value. 


4. Decide if the correlation is 
significant. 


5. Interpret the decision in the 
context of the original claim. 
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EXAMPLE 6 


» Using Table 11 for a Correlation Coefficient 


In Example 5, you used 25 pairs of data to find r ~ 0.979. Is the correlation 
coefficient significant? Use a = 0.05. 


> Solution 

The number of pairs of data is 25, so n = 25. The level of significance 
is a = 0.05. Using Table 11, find the critical value in the a = 0.05 column 
that corresponds to the row with n = 25. The number in that column and row 


is 0.396. 

n a@=0.05 a=0.01 
4 0.990 
5 0.959 
6 0.917 
if 0.875 
INSIGHT 8 0.834 
: 9 0.798 
Notice that the fewer a 10 0.765 
the data points in your 11 0.735 
study, the stronger 12 0.708 
the evidence has to 13 0.684 
be to conclude that the 14 0.661 

correlation coefficient 

is significant. 19 0.575 
20 0.561 
21 0.549 
22 0.537 
23 0.526 
24 0.515 
0.505 
26 0.388 0.496 
if 0.381 0.487 
28 0.374 0.479 
29 0.367 0.471 


Because |r| ~ 0.979 > 0.396, you can decide that the population correlation is 
significant. 


Interpretation There is enough evidence at the 5% level of significance to 
conclude that there is a significant linear correlation between the duration of 
Old Faithful’s eruptions and the time between eruptions. 


> Try It Yourself 6 


In Try It Yourself 4, you calculated the correlation coefficient of the number 
of years out of school and annual contribution data to be r ~ —0.908. Is the 
correlation coefficient significant? Use a = 0.01. 


. Determine the number of pairs of data in the sample. 

. Identify the level of significance. 

. Find the critical value. Use Table 11 in Appendix B. 

. Compare |r| with the critical value and decide if the correlation is 
significant. 

e. Interpret the decision in the context of the original claim. 

Answer: Page A44 


ae oo & 
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> HYPOTHESIS TESTING FOR A POPULATION 
CORRELATION COEFFICIENT p 


You can also use a hypothesis test to determine whether the sample correlation 
coefficient r provides enough evidence to conclude that the population correlation 
coefficient p is significant. A hypothesis test for p can be one-tailed or two-tailed. 
The null and alternative hypotheses for these tests are as follows. 


i p = 0 (no significant negative correlation) Left-tailed test 


H,: p < 0 (significant negative correlation) 


a p = 0 (no significant positive correlation) Right-tailed test 


H,: p > 0 (significant positive correlation) 


mo p = 0 (no significant correlation) Two-tailed test 


H,: p # 0 (significant correlation) 


In this text, you will consider only two-tailed hypothesis tests for p. 


THE t-TEST FOR THE CORRELATION COEFFICIENT 


A t-test can be used to test whether the correlation between two variables is 
significant. The test statistic is r and the standardized test statistic 


lp ip 
i. 


Oy [1 — 7? 
i = 2 


follows a t-distribution with n — 2 degrees of freedom. 


GUIDELINES 


Using the ¢-Test for the Correlation Coefficient p 


IN WORDS IN SYMBOLS 
1. Identify the null and alternative hypotheses. State Hy and H,. 
2. Specify the level of significance. Identify a. 
3. Identify the degrees of freedom. chit, = a = 2 
4. Determine the critical value(s) and Use Table 5 in 
the rejection region(s). Appendix B. 
5. Find the standardized test statistic. t= SS 
eer 
iQ = 2 
6. Make a decision to reject or fail to reject If ¢ is in the rejec- 
the null hypothesis. tion region, reject 


HA. Otherwise, fail 
to reject A. 


7. Interpret the decision in the context of 
the original claim. 
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INSIGHT 


In Example 7, you can use 
Table 11 in Appendix B to test the 
population correlation coefficient 


p. Given n = 10 and a = 
the critical value from Ta 
is 0.632. Because 


Ir| + 0.882 > 0.632, 


the correlation is 
significant. Note 

that this is the same 
result you obtained 
using a t-test for the 
population correlation 
coefficient p. 


STUDY TIP 


Be sure you see in 
Example 7 that rejecting 
the null hypothesis 
means that there is 
enough evidence 

that the correlation 

is significant. 


0.05, 
ble 11 


: 
t 
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EXAMPLE 7 G® Report 39 


> The t-Test for a Correlation Coefficient 


In Example 4, you used 10 pairs of data to find r ~ 0.882. Test the 
significance of this correlation coefficient. Use a = 0.05. 


> Solution 
The null and alternative hypotheses are 
Ho: p = 0 (nocorrelation) and H,: p # 0 (significant correlation). 


Because there are 10 pairs of data in the sample, there are 10 — 2 = 8 degrees 
of freedom. Because the test is a two-tailed test, a = 0.05, and d.f. = 8, the 
critical values are —tg = —2.306 and ft) = 2.306. The rejection regions are 
t < —2.306 and t > 2.306. Using the r-test, the standardized test statistic is 


The following graph shows the location of the rejection regions and the 
standardized test statistic. 


Because ¢ is in the rejection region, you should decide to reject the null 
hypothesis. 


Interpretation ‘There is enough evidence at the 5% level of significance to 
conclude that there is a significant linear correlation between gross domestic 
products and carbon dioxide emissions. 


> Try It Yourself 7 


In Try It Yourself 5, you calculated the correlation coefficient of the salaries 
and average attendances at home games for the teams in Major League 
Baseball to be r ~ 0.74972. Test the significance of this correlation coefficient. 
Use a = 0.01. 


. State the null and alternative hypotheses. 

. Identify the level of significance. 

. Identify the degrees of freedom. 

Determine the critical values and the rejection regions. 

. Find the standardized test statistic. 

Make a decision to reject or fail to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A45 


aemonannr af 
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The following scatter plot shows 
the results of a survey conducted 
as a group project by students in 
a high school statistics class in the 
San Francisco area. In the survey, 
125 high school students were 
asked their grade point average 
(GPA) and the number of caffeine 
drinks they consumed each day. 


= 
= 
3) 
n 
rs 
a 
S 
im 
ie) 
2 
x 
© 


I aes et 
24 8 1012 14 
Caffeine drinks 

(cups per day) 


What type of correlation, 
if any, does the scatter plot 
show between caffeine 
consumption and GPA? 


» CORRELATION AND CAUSATION 


The fact that two variables are strongly correlated does not in itself imply a 
cause-and-effect relationship between the variables. More in-depth study is usually 
needed to determine whether there is a causal relationship between the variables. 

If there is a significant correlation between two variables, a researcher should 
consider the following possibilities. 


1. Is there a direct cause-and-effect relationship between the variables? 


That is, does x cause y? For instance, consider the relationship between 
gross domestic products and carbon dioxide emissions that has been 
discussed throughout this section. It is reasonable to conclude that an 
increase in a country’s gross domestic product will result in higher 
carbon dioxide emissions. 


2. Is there a reverse cause-and-effect relationship between the variables? 


That is, does y cause x? For instance, consider the Old Faithful data 
that have been discussed throughout this section. These variables have 
a positive linear correlation, and it is possible to conclude that the 
duration of an eruption affects the time before the next eruption. 
However, it is also possible that the time between eruptions affects 
the duration of the next eruption. 


3. Is it possible that the relationship between the variables can be caused 
by a third variable or perhaps a combination of several other variables? 


For instance, consider the salaries and average attendances per home 
game for the teams in Major League Baseball listed in the Chapter 
Opener. Although these variables have a positive linear correlation, 
it is doubtful that just because a team’s salary decreases, the average 
attendance per home game will also decrease. The relationship is 
probably due to several other variables, such as the economy, the 
players on the team, and whether or not the team is winning games. 


4. Is it possible that the relationship between two variables may be a 
coincidence? 


For instance, although it may be possible to find a significant 
correlation between the number of animal species living in certain 
regions and the number of people who own more than two cars in 
those regions, it is highly unlikely that the variables are directly 
related. The relationship is probably due to coincidence. 


Determining which of the cases above is valid for a data set can be difficult. 
For instance, consider the following example. Suppose a person breaks out in a 
rash each time he eats shrimp at a certain restaurant. The natural conclusion is 
that the person is allergic to shrimp. However, upon further study by an allergist, 
it is found that the person is not allergic to shrimp, but to a type of seasoning the 
chef is putting into the shrimp. 
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BUILDING BASIC SKILLS AND VOCABULARY 


Two variables have a positive linear correlation. Does the dependent 
variable increase or decrease as the independent variable increases? 


Two variables have a negative linear correlation. Does the dependent 
variable increase or decrease as the independent variable increases? 


Describe the range of values for the correlation coefficient. 


What does the sample correlation coefficient r measure? Which value 
indicates a stronger correlation: r = 0.918 or r = —0.932? Explain your 
reasoning. 


Give examples of two variables that have perfect positive linear correlation 
and two variables that have perfect negative linear correlation. 


Explain how to decide whether a sample correlation coefficient indicates 
that the population correlation coefficient is significant. 


Discuss the difference between r and p. 


In your own words, what does it mean to say “correlation does not imply 
causation”? 


Graphical Analysis Jn Exercises 9-14, the scatter plots of paired data sets are 
shown. Determine whether there is a perfect positive linear correlation, a strong 
positive linear correlation, a perfect negative linear correlation, a strong negative 
linear correlation, or no linear correlation between the variables. 


9. 


11. 


13. 


7 10. ? : 
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Graphical Analysis Jn Exercises 15-18, the scatter plots show the results of a 
survey of 20 randomly selected males ages 24-35. Using age as the explanatory 
variable, match each graph with the appropriate description. Explain your reasoning. 


(a) Age and body temperature (b) Age and balance on student loans 
(c) Age and income (d) Age and height 
15. . 16. 
€ 80 + 
$ nan 75 e ° 
a = |, gee? e@ 
q 8 107 —* oe 
z a) 65+ 2 s kd e 
° 
s zl 
4 x 
26 30 34 26 30 34 
Age Age 
y y 
17. 2 4 18. af 
= 50+ 110+ 
de e 
2 40 +- 2 100 ---geggegte ess 
2 30+ E oot 
$ e bal S 
3 20+ ° S.». = got 
2 19+" as © 70-4 
< Ce ote 
_ x x 
26 30 34 26 30 34 
Age Age 


In Exercises 19 and 20, identify the explanatory variable and the response variable. 


19. A nutritionist wants to determine if the amounts of water consumed each 
day by persons of the same weight and on the same diet can be used to 
predict individual weight loss. 


20. An insurance company hires an actuary to determine whether the number of 
hours of safety driving classes can be used to predict the number of driving 
accidents for each driver. 


M@ USING AND INTERPRETING CONCEPTS 


Constructing a Scatter Plot and Determining Correlation Jn Exercises 
21-28, (a) display the data in a scatter plot, (b) calculate the sample correlation 
coefficient r, and (c) make a conclusion about the type of correlation. 


* 21. Age and Blood Pressure The ages (in years) of 10 men and their 
systolic blood pressures 


39 45 49 64 70 29 57 22 


eae 109 | 122 | 143 | 132 | 199 | 185 | 199 | 130 | 175 | 118 


« 22. Age and Vocabulary The ages (in years) of 11 children and the number 
of words in their vocabulary 


r. 


2 3 4 5) 6 


Se : 
Vocabulary size, y 3. 440 1200 1500 | 2100 | 2600 
os ee 


1100 2000 500 1525 2500 
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‘. 23. Hours Studying and Test Scores The number of hours 13 students 
spent studying for a test and their scores on that test 


.s«d| re ee 
17);2/;4/4;),5)5)5)]6)6)7)7) 8 


i=) 


40 41 | 51 48 64 69 | 73 | 75 | 68 | 93 | 84 90 95 | 
‘, 24, Hours Online and Test Scores The number of hours 12 students spent 
online during the weekend and the scores of each student who took a 


test the following Monday 


O;1/2);3;,3)5)5/5)6;]7)7/10 


96 | 85 82) 74 95 | 68 76 84 58 65 75 | 50 | 


25. Movie Budgets and Grosses The budget (in millions of dollars) and 
worldwide gross (in millions of dollars) for eight of the most expensive 
movies ever made (Adapted from The Numbers) 


300 | 258 | 250 | 210 | 232 | 230 | 225 | 207 


961 | 891 | 937 | 836 | 391 576 419 551 


‘ 26. Speed of Sound The altitude (in thousands of feet) and speed of sound 
(in feet per second) 


0 5 10 is | 2 | 25 | 


1116.3) 1096.9 | 1077.3.) 1057.2 | 1036.8 | 1015.8 


30 35, 40 45 50 


994.5 969.0 | 967.7 | 967.7 | 967.7 


‘., 27. Earnings and Dividends The earnings per share and dividends per 
share for 12 medical supplies companies in a recent year (Source: The 
Value Line Investment Survey) 


6.00 1.44 | 444 | 338 | 3.63 4.46 
2.45 | 0.15 | 0.62, 0.91 0.68 1.14 


3.80 | 1.43 | 1.88 | 4.57 | 4.28 2.92 


0.52 0.06 | 0.19 1.80 | 0.48 0.63 


‘*, 28. Crimes and Arrests The number of crimes reported (in millions) and 
the number of arrests reported (in millions) by the U.S. Department of 
Justice for 14 years (Adapted from the National Crime Victimization Survey 
and Uniform Crime Reports) 


| 1.60 | 1.55 | 1.44 | 1.40 | 1.32 | 123 | 1.22 | 
078 080 0.73 072 068 | 0.64 | 0.63 | 


1.23, 1.22 | 1.18 | 1.16 | 1.19 | 1.21 | 1.20 


0.63 0.62 0.60 0.59 0.60 0.61 0.58 
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CORRELATION AND REGRESSION 


29. 


30. 


A student spends 1 hour studying and gets a test score of 99. Add this data 
entry to the current data set in Exercise 23. Describe how adding this data 
entry changes the correlation coefficient r. Why do you think it changed? 


A student spends 12 hours online during the weekend and gets a test score of 98. 
Add this data entry to the current data set in Exercise 24. Describe how adding 
this data entry changes the correlation coefficient r. Why do you think it changed? 


Testing Claims Jn Exercises 31-36, use Table 11 in Appendix B as shown 
in Example 6, or perform a hypothesis test using Table 5 in Appendix B as shown 
in Example 7, to make a conclusion about the indicated correlation coefficient. 
If convenient, use technology to solve the problem. 


31. 


32. 


33. 


34. 


35. 


36. 


Braking Distances: Dry Surface The weights (in pounds) of eight vehicles 
and the variability of their braking distances (in feet) when stopping on a dry 
surface are shown in the table. Can you conclude that there is a significant 
linear correlation between vehicle weight and variability in braking distance 
on a dry surface? Use a = 0.01. (Adapted from National Highway Traffic Safety 
Administration) 


5940 5340 6500 | 5100 5850 | 4800 5600 | 5890 


oe 1.78 1.93 1.91 1.59 1.66 1.50 1.61 1.70 


Braking Distances: Wet Surface The weights (in pounds) of eight vehicles 
and the variability of their braking distances (in feet) when stopping on a wet 
surface are shown in the table. At a = 0.05, can you conclude that there is 
a significant linear correlation between vehicle weight and variability in 
braking distance on a wet surface? (Adapted from National Highway Traffic 
Safety Administration) 


5890 5340 =6500 | 4800 §=5940 5600 = 5100 |) 5850 


Hours Studying and Test Scores The table in Exercise 23 shows the number 
of hours 13 students spent studying for a test and their scores on that test. At 
a = 0.01, is there enough evidence to conclude that there is a significant 
linear correlation between the data? (Use the value of r found in Exercise 23.) 


Hours Online and Test Scores The table in Exercise 24 shows the number 
of hours spent online and the test scores for 12 randomly selected students. 
At a = 0.05, is there enough evidence to conclude that there is a significant 
linear correlation between the data? (Use the value of r found in Exercise 24.) 


Earnings and Dividends ‘The table in Exercise 27 shows the earnings per 
share and dividends per share for 12 medical supplies companies in a 
recent year. At a = 0.01, can you conclude that there is a significant linear 
correlation between earnings per share and dividends per share? (Use the 
value of r found in Exercise 27.) 


Crimes and Arrests The table in Exercise 28 shows the number of crimes 
reported (in millions) and the number of arrests reported (in millions) by the 
US. Department of Justice for 14 years. At a = 0.05, can you conclude that 
there is a significant linear correlation between the number of crimes and the 
number of arrests? (Use the value of r found in Exercise 28.) 
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In Exercises 37 and 38, use StatCrunch to (a) display the data in a scatter 
plot, (b) calculate the correlation coefficient r, and (c) test the significance of the 
correlation coefficient. 


37. 


38. 


39. 


40. 


Earthquakes A researcher wants to determine if there is a linear 
relationship between the magnitudes of earthquakes and their depths below 
the surface at the epicenter. The magnitudes and depths (in kilometers) 
of eight recent earthquakes are shown in the table. Use a = 0.01. (Source: 
U.S. Geological Survey) 


| Magnitude,x 7.7 | 67 | 69 | 68 | 40 | 3.8 | 7.1 | 5.9 


35 18 17 26 > 10 25 10 


Income Level and Charitable Donations A sociologist wants to determine 
if there is a linear relationship between family income level and percent of 
income donated to charities. The income levels (in thousands of dollars) and 
percents of income donated to charities for seven families are shown in the 
table. Use a = 0.05. 


An earthquake is recorded with a magnitude of 6.3 and a depth of 
620 kilometers. Add this data entry to the current data set in Exercise 37. 
Describe how adding this data entry changes the correlation coefficient r and 
your decision to reject or fail to reject the null hypothesis. 


A family has an income level of $75,000 and donates 1% of their income to 
charities. Add this data entry to the current data set in Exercise 38. Describe 
how adding this data entry changes the correlation coefficient r and your 
decision to reject or fail to reject the null hypothesis. 


M@ EXTENDING CONCEPTS 


Interchanging x and y Jn Exercises 41 and 42, calculate the correlation 
coefficient r, letting Row 1 represent the x-values and Row 2 the y-values. Then 
calculate the correlation coefficient r, letting Row 2 represent the x-values and 
Row I the y-values. What effect does switching the explanatory and response 
variables have on the correlation coefficient? 


41. 


43. 


Row 1 16 | 25 | 39 | 45 | 49 | 64 | 70 
Row 2 109 | 122 | 143 | 132. 199 | 185 | 199 
Row 96 | 35 | 82 | 74 | 95 | 68 | 76 | 84 | 58 | 65 


Writing Use your school’s library, the Internet, or some other reference 

source to find a real-life data set with the indicated cause-and-effect 

relationship. Write a paragraph describing each variable and explain why 

you think the variables have the indicated cause-and-effect relationship. 

(a) Direct Cause-and-Effect: Changes in one variable cause changes in the 
other variable. 

(b) Other Factors: The relationship between the variables is caused by a 
third variable. 

(c) Coincidence: The relationship between the variables is a coincidence. 
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Correlation by Eye 


The correlation by eye applet allows you to guess the sample correlation 
APPLET coefficient r for a data set. When the applet loads, a data set consisting of 

20 points is displayed. Points can be added to the plot by clicking the mouse. 

Points on the plot 

can be removed by y 

clicking on the point ‘ 

and then dragging the 

point into the trash 55 ed e 

can. All of the points ° 

on the plot can be e © 

removed by simply 

clicking inside the 

trash can. You can m 

enter your guess for r ‘ 

in the “Guess” field, ° 

and then click 40 

SHOW R! to see if : 

you are within 0.1 of 

the true value. When na i . ns z A 

you. click NEW x 

DATA, a new data 

set is generated. Guess: 


True r: 


New data | Show r! | 


m Explore 


Step 1 Add five points to the plot. 

Step 2 Enter a guess for r. 

Step 3 Click SHOW R!. 

Step 4 Click NEW DATA. 

Step 5 Remove five points from the plot. 
Step 6 Enter a guess for r. 

Step 7 Click SHOW R!. 


= Draw Conclusions 


APPLET 1. Generate a new data set. Using your knowledge of correlation, try to guess the 
value of r for the data set. Repeat this 10 times. How many times were you 
correct? Describe how you chose each r value. 


2. Describe how to create a data set with a value of r that is approximately 1. 
3. Describe how to create a data set with a value of r that is approximately 0. 


4. Try to create a data set with a value of r that is approximately —0.9. Then try 
to create a data set with a value of r that is approximately 0.9. What did you 
do differently to create the two data sets? 


WHAT YOU SHOULD LEARN 


>» How to find the equation of a 
regression line 


> How to predict y-values using 
a regression equation 


STUDY TIP 


When determining the 
equation of a regression 
line, it is helpful to 
construct a scatter plot 

of the data to check for 
outliers, which can 
greatly influence a 
regression line. You 
should also check for gaps 
and clusters in the data. 


Presented by: https://jafrilibrary.org 


SECTION 9.2 LINEAR REGRESSION 501 


Linear Regression 


Regression Lines > Applications of Regression Lines 


>» REGRESSION LINES 


After verifying that the linear correlation between two variables is significant, 
the next step is to determine the equation of the line that best models the data. 
This line is called a regression line, and its equation can be used to predict the 
value of y for a given value of x. Although many lines can be drawn through a set 
of points, a regression line is determined by specific criteria. 

Consider the scatter plot and the line shown below. For each data point, d; 
represents the difference between the observed y-value and the predicted 
y-value for a given x-value on the line. These differences are called residuals and 
can be positive, negative, or zero. When the point is above the line, d; is positive. 
When the point is below the line, d; is negative. If the observed y-value equals the 
predicted y-value, d; = 0. Of all possible lines that can be drawn through a set of 
points, the regression line is the line for which the sum of the squares of all the 
residuals 


Yd? 
is a minimum. 


y 
A 


Observed 
y-value 


| }e 
1°45 
~_ Predicted ¢ 
y-value 


For a given x-value, 
e d= (observed y-value) — (predicted y-value) 


>X 


DEFINITION 


A regression line, also called a line of best fit, is the line for which the sum of 
the squares of the residuals is a minimum. 


In algebra, you learned that you can write 
an equation of a line by finding its slope m and 
y-intercept b. The equation has the form 


y=mx + b. 


Recall that the slope of a line is the ratio of its rise 
over its run and the y-intercept is the y-value of the 
point at which the line crosses the y-axis. It is the 
y-value when x = 0. 

In algebra, you used two points to determine the 
equation of a line. In statistics, you will use every 
point in the data set to determine the equation of the 
regression line. 
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The equation of a regression line allows you to use the independent 
(explanatory) variable x to make predictions for the dependent (response) 


variable y. 
STUDY TIP THE EQUATION OF A REGRESSION LINE 
Notice that both the slope m The equation of a regression line for an independent variable x and a 


and the y-intercept b in dependent variable y is 


Example 1 are rounded 
to three decimal places. 
This round-off rule will 
be used throughout 
the text. 


y=mx+ bd 


where y is the predicted y-value for a given x-value. The slope m and 
y-intercept b are given by 


ne a) a a ee ee ee 2k 
Ale — (>) y 


where y is the mean of the y-values in the data set and X is the mean of the 
x-values. The regression line always passes through the point (x, y). 


EXAMPLE 1 


» Finding the Equation of a Regression Line 


Find the equation of the regression line for the gross domestic products and 
carbon dioxide emissions data used in Section 9.1. 


> Solution 
16 —— In Example 4 of Section 9.1, you found that n = 10, } x = 23.1, Sy = 5554, 
3.6 828.8 xy = 15,573.71, and > x? = 67.35. You can use these values to calculate the 
4.9 1214.2 slope and y-intercept of the regression line as shown. 
11 444.6 n>xy — (>x)(y) 
2 — 10(15,573.71) — (23.1)(5554 
Lif 571.8 = (15, 7) ( : us ) 
23 454.9 10(67.35) — 23.1 
1.6 358.7 27,439.7 
= = 196.151977 
1.5 573.5 139.89 
5554 23.1 
=y-mx ~ —— — (196.151 — 
b=y-—mx 10 (196.151977) 10 
= 555.4 — (196.151977)(2.31) 
= 102.2889 


So, the equation of the regression 
line is 
y = 196.152x + 102.289. 


To sketch the regression line, use 
any two x-values within the 
range of data and calculate their 
corresponding y-values from the 
regression line. Then draw a line 
through the two points. The GDP (in trillions of dollars) 
regression line and scatter plot 

of the data are shown at the right. If you plot the point (x, y) = (2.31, 555.4), 
you will notice that the line passes through this point. 


CO, emissions 
(in millions of metric tons) 
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> Try It Yourself 1 


Find the equation of the regression line for the number of years out of school 
and annual contribution data used in Section 9.1. 


a. Identify n, Xx, Sy, xy, and > x? from Try It Yourself 4 in Section 9.1. 
b. Calculate the slope m and the y-intercept b. 


c. Write the equation of the regression line. Answer: Page A45 

1.80 56 3.78 79 » Using Technology to Find a Regression Equation 
1.82 58 3.83 85 Use a technology tool to find the equation of the regression line for the Old 
1.90 62 3.88 80 Faithful data used in Section 9.1. 
1.98 37 —e 90 MINITAB, Excel, and the TI-83/84 Plus each have features that automatically 
2.05 57 4.30 89 calculate a regression equation. Try using this technology to find the regression 
2.13 60 4.43 89 equation. You should obtain results similar to the following. 
2.30 57 4.47 86 
2.37 61 4.53 89 MINITAB 
4.55 86 Regression Analysis: C2 versus C1 
3.13 76 4.60 92 : hee 

The regression equation is 
sii a Predictor Coef SE Coef T P 

Constant 33.683 1.894 WH /8) 0.000 

C1 12.4809 0.5464 22.84 0.000 


S=2.88153 ASq=95.8%  RSafadj) = 95.6% 


TI-83/84 PLUS 


5 B C | D LinReg 

>, To explore this topic further, 21/6) | a ee eee eee | 
~ see Activity 9.2 on page 511. INDEMLINESTknown-ys.known_x's], 1] a=12.48094391 
| : |. -. 4 b=33.68290034 


- a r®?=,9577738551 
(known_y's, known_x's), 2) r=.9786592129 


From the displays, you can see that the regression equation is 
y = 12.481x + 33.683. 


The TI-83/84 Plus display at the left shows the regression line and a scatter plot 
of the data in the same viewing window. To do this, use Stat Plot to construct 
the scatter plot and enter the regression equation as y;. 


> Try It Yourself 2 


Use a technology tool to find the equation of the regression line for the 
salaries and average attendances at home games for the teams in Major 
League Baseball given in the Chapter Opener on page 483. 


a. Enter the data. 
b. Perform the necessary steps to calculate the slope and y-intercept. 
c. Specify the regression equation. Answer: Page A45 
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STUDY TIP 

If the correlation between 

xX and y is not significant, 

the best predicted y-value 

is y, the mean of the 7 
y-values in the : 
data set. 


The following scatter plot shows 
the relationship between the 
number of farms (in thousands) 
in a state and the total value of 
the farms (in billions of dollars). 
(Source: U.S. Department of Agriculture and 
National Agriculture Statistics Service) 


Total value 
(in billions of dollars) 


Lal 
25-5 
(ee 


T 
50 100 150 200 250 
Farms (in thousands) 


Describe the correlation 
between these two variables 
in words. Use the scatter plot 
to predict the total value of 
farms in a state that has 
150,000 farms. The regression 
line for this scatter plot is 

y = 0.714x + 3.367. Use this 
equation to make a prediction. 
(Assume x and y have a 
significant linear correlation.) 
How does your algebraic 
prediction compare with 

your graphical one? 


>» APPLICATIONS OF REGRESSION LINES 


After finding the equation of a regression line, you can use the equation to 
predict y-values over the range of the data if the correlation between x and y is 
significant. For instance, an environmentalist could forecast carbon dioxide 
emissions on the basis of gross domestic products. To predict y-values, substitute 
the given x-value into the regression equation, then calculate y, the predicted 
y-value. 


EXAMPLE 3 


> Predicting y-Values Using Regression Equations 
The regression equation for the gross domestic products (in trillions of dollars) 
and carbon dioxide emissions (in millions of metric tons) data is 

y = 196.152x + 102.289. 


Use this equation to predict the expected carbon dioxide emissions for the 
following gross domestic products. (Recall from Section 9.1, Example 7, that x 
and y have a significant linear correlation.) 


1. 1.2 trillion dollars 2. 2.0 trillion dollars 3. 2.5 trillion dollars 


> Solution 


To predict the expected carbon dioxide emissions, substitute each gross 
domestic product for x in the regression equation. Then calculate y. 


1. y = 196.152x + 102.289 Interpretation When the gross domestic 
= 196.152(1.2) + 102.289 product is $1.2 trillion, the CO2 emissions 
= 337.671 are about 337.671 million metric tons. 

2. y = 196.152x + 102.289 Interpretation When the gross domestic 
= 196.152(2.0) + 102.289 product is $2.0 trillion, the COz emissions 
= 494.593 are 494.593 million metric tons. 

3. » = 196.152x + 102.289 Interpretation When the gross domestic 
= 196.152(2.5) + 102.289 product is $2.5 trillion, the CO2 emissions 
= 592.669 are 592.669 million metric tons. 


Prediction values are meaningful only for x-values in (or close to) the range of 
the data. The x-values in the original data set range from 0.9 to 4.9. So, it would 
not be appropriate to use the regression line » = 196.152x + 102.289 to 
predict carbon dioxide emissions for gross domestic products such as $0.2 or 
$14.5 trillion dollars. 


> Try It Yourself 3 


The regression equation for the Old Faithful data is y = 12.481x + 33.683. 
Use this to predict the time until the next eruption for each of the following 
eruption durations. (Recall from Section 9.1, Example 6, that x and y have a 
significant linear correlation.) 


1. 2 minutes 
2. 3.32 minutes 


a. Substitute each value of x into the regression equation. 

b. Calculate y. 

c. Specify the predicted time until the next eruption for each eruption duration. 
Answer: Page A45 
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DE) EXERCISES 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What is a residual? Explain when a residual is positive, negative, and zero. 


2. Two variables have a positive linear correlation. Is the slope of the regression 
FOR EXTRA HELP; line for the variables positive or negative? 


a Ai 3. Explain how to predict y-values using the equation of a regression line. 


4. Given a set of data and a corresponding regression line, describe all values of 
x that provide meaningful predictions for y. 


5. In order to predict y-values using the equation of a regression line, what must 
be true about the correlation coefficient of the variables? 


6. Why is it not appropriate to use a regression line to predict y-values for 
x-values that are not in (or close to) the range of x-values found in the data? 


In Exercises 7-12, match the description in the left column with its symbol(s) in the 
right column. 


7. The y-value of a data point corresponding to x; a. 5; 
8. The y-value for a point on the regression line b. y; 
corresponding to x; 
c. b 
9. Slope —— 
d. (x, y) 
10. y-intercept 
em 
11. The mean of the y-values p= 
* y 


12. The point a regression line always passes through 


Graphical Analysis Jn Exercises 13-16, match the regression equation with the 
appropriate graph. (Note that the x- and y-axes are broken.) 


13. y = —1.04x + 50.3 14, y = 1.662x + 83.34 
15. » = 0.00114x + 2.53 16. y = —0.667x + 52.6 
a. y b. » 


————— 


9— ee 


Systolic blood pressure 
a 
Co 
i 
i] 


Energy-efficiency rating 


7 x 
6100 6300 6500 15 25 35 45 55 65 75 


Cooling capacity Age (in years) 
(in BTUs) 


Fat (in grams) 
a 
t 
Leisure time 
(in hours per week) 


12-+- ° 
= e 

x 

32 33 34 35 36 37 38 40 42 44 46 48 50 52 


Protein (in grams) Work time 
(in hours per week) 
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M@ USING AND INTERPRETING CONCEPTS 


Finding the Equation of a Regression Line Jn Exercises 17-24, find the 
equation of the regression line for the given data. Then construct a scatter plot of 
the data and draw the regression line. (Each pair of variables has a significant 
correlation.) Then use the regression equation to predict the value of y for each of 
the given x-values, if meaningful. If the x-value is not meaningful to predict the 
value of y, explain why not. If convenient, use technology to solve the problem. 


17. Atlanta Building Heights The heights (in feet) and the number of stories of 
nine notable buildings in Atlanta (Source: Emporis Corporation) 


869 820 771 696 692 676 656 492 486 
60 50 50 52 40 47 41 39 26 


(a) x = 800 feet (b) x = 750 feet 
(c) x = 400 feet (d) x = 625 feet 


18. Square Footages and Home Sale Prices The square footages and sale prices 
(in thousands of dollars) of seven homes (Source: Howard Hanna) 


1924 1592 | 2413 | 2332 | 1552 | 1312 | 1278 
174.9 136.9 | 275.0 | 219.9 | 120.0 | 99.9 | 145.0 


(a) x = 1450 square feet (b) x = 2720 square feet 
(c) x = 2175 square feet (d) x = 1890 square feet 


, 19. Hours Studying and Test Scores The number of hours 13 students 
spent studying for a test and their scores on that test 


40 41 Sl 48 64 69 | 73 


75 68 93 84 90 | 95 


(a) x = 3 hours (b) x = 6.5 hours 
(c) x = 13 hours (d) x = 4.5 hours 


‘*. 20. Hours Online The number of hours 12 students spent online during the 
weekend and the scores of each student who took a test the following 
Monday 


(a) x = 4 hours (b) x = 8 hours 
(c) x = 9 hours (d) x = 15 hours 
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Hot Dogs: Caloric and Sodium Content The caloric contents and the 
sodium contents (in milligrams) of 10 beef hot dogs (Source: Consumer 


150 | 170 120 120. 90 
420 | 470 | 350 | 360 | 270 


180 170 140 90 110 


as 
aS 
g 
S 
~~ 


550 | 530 460 380 330 


(a) x = 170 calories (b) x = 100 calories 
(c) x = 140 calories (d) x = 210 calories 


. High-Fiber Cereals: Caloric and Sugar Content The caloric contents 


and the sugar contents (in grams) of 11 high-fiber breakfast cereals 
(Source: Consumer Reports) 


140 200 160 170 170 | 190 


6|91]6]|9 | 10 | 47 


190 210 190 170. 160 

13 18 19 10 10 
(a) x = 150 calories (b) x = 90 calories 
(c) x = 175 calories (d) x = 208 calories 


. Shoe Size and Height The shoe sizes and heights (in inches) of 14 men 


8.5 | 9.0 | 90 95 | 10.0 10.0 | 10.5 


66.0 | 68.5 | 67.5 | 70.0 | 70.0 | 72.0 | 71.5 


10.5 11.0 11.0 | 11.0 12.0 12.0 |) 12.5 


695 | 715 | 72.0 | 73.0 | 735 | 74.0 | 74.0. 


(a) x = size 11.5 (b) x = size 8.0 
(c) x = size 15.5 (d) x = size 10.0 


. Age and Hours Slept The ages (in years) of 10 infants and the number 


of hours each slept in a day 


01 02 04 07 06 09 
49 145 139 141 13.9 13.7 


01 02 04 09 


143° 13.9 140 141 


(a) x = 0.3 year (b) x = 3.9 years 
(c) x = 0.6 year (d) x = 0.8 year 
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Registered Nurse Salaries In Exercises 25-29, use the following 
information. You work for a salary analyst and gather the data shown in the table. 


The table shows the years of experience of 14 registered nurses and their annual 


Oe an salaries. (Adapted from Payscale, Inc.) 
; en 25. Correlation Using the scatter plot of : 
the registered nurse salary data shown, Registered Nurses 
5 46.7 what type of correlation, if any, do you A 
7 50.2 think the data have? Explain. g aaa ‘ 
9 53.6 Z OT _ 8 
© 26. Regression Line Find an equation 8 61 . — 
10 54.0 : . ; = e 
of the regression line for the data. = ze 
12.5 58.4 Sketch a scatter plot of the data 2 : . 
13 61.8 and draw the regression line. i: * T 2 
= e 
16 63.9 27. Using the Regression Line The : cle 
18 67.5 analyst used the regression line you ~* eee ee ee 
20 64.3 found in Exercise 26 to predict the 2 4 6 8 1012141618202224 
oy) 60.1 annual salary for a registered nurse with Years of experience 
96 59.9 28 years of experience. Is this a valid 


prediction? Explain your reasoning. 


dene ete eneine enn ae 28. Significant Correlation? The analyst claims that the population has a 


significant correlation for a = 0.01. Verify this claim. 


29. Cause and Effect Write a paragraph describing the cause-and-effect 
relationship between the years of experience and the annual salaries of 
registered nurses. 


In Exercises 30 and 31, use StatCrunch to (a) find the equation of the 
regression line for the data, (b) find the correlation coefficient r, and (c) construct 
the scatter plot of the data that also shows the regression line. (Each pair of 
variables has a significant correlation.) 


30. Hot Chocolates: Caloric and Fat Contents The caloric contents and the 
fat contents (in grams) of 6- to 8-ounce servings for 10 hot chocolate 
products (Source: Consumer Reports) 


262 140 150 159 120 140 185 | 150 80 80 
63 | 20 | 35 | 35 | 2.5 | 35 | 68 3 3 2.5 


31. Wins and Earned Run Averages The number of wins and the earned run 
averages (mean number of earned runs allowed per nine innings pitched) for 
eight professional baseball pitchers in the 2009 regular season (Source: ESPN) 


19 17 16 15 15 14 12 9 


_ Eamed run average, y_ 2.63 2.79 | 3.75 3.23. 3.47 | 3.96 4.05 | 4.12 


M@ EXTENDING CONCEPTS 
Interchanging x and y In Exercises 32 and 33, do the following. 


(a) Find the equation of the regression line for the given data, letting Row 1 
represent the x-values and Row 2 the y-values. Sketch a scatter plot of the data 
and draw the regression line. 
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(b) Find the equation of the regression line for the given data, letting Row 2 
represent the x-values and Row 1 the y-values. Sketch a scatter plot of the data 
and draw the regression line. 


(c) What effect does switching the explanatory and response variables have on the 
regression line? 


32. Row 1 16 | 25 | 39 | 45 | 49 | 64 | 70 
Row? 109 | 122 | 143 | 132 | 199 | 185 | 199 
Row2 96 | 85 | 82 | 74 | 95 | 68 | 76 | 84 | 58 | 65 


Residual Plots A residual plot allows you to assess correlation data and check 
for possible problems with a regression model. To construct a residual plot, make 
a scatter plot of (x, y — y), where y — y is the residual of each y-value. If 
the resulting plot shows any type of pattern, the regression line is not a good 
representation of the relationship between the two variables. If it does not show a 
pattern—that is, if the residuals fluctuate about 0—then the regression line is a good 
representation. Be aware that if a point on the residual plot appears to be outside 
the pattern of the other points, then it may be an outlier. 


In Exercises 34 and 35, (a) find the equation of the regression line, (b) construct a 
scatter plot of the data and draw the regression line, (c) construct a residual plot, 
and (d) determine if there are any patterns in the residual plot and explain what 
they suggest about the relationship between the variables. 


| Be] 4 [il 7 [6] 3 fj] s 


y 18 11 29 18) 14 


ane bal 38 34 40 46 43 48 60 55 | 52 
y 24 22 | 27 | 32. 30 | 31 | 27 | 26 | 28 


io) 


25 | 20 | 12 


Influential Points An influential point is a point in a data set that can greatly 
affect the graph of a regression line. An outlier may or may not be an influential 
point. To determine if a point is influential, find two regression lines: one 
including all the points in the data set, and the other excluding the possible 
influential point. If the slope or y-intercept of the regression line shows significant 
changes, the point can be considered influential. An influential point can be removed 
from a data set only if there is proper justification. 


In Exercises 36 and 37, (a) construct a scatter plot of the data, (b) identify any 
possible outliers, and (c) determine if the point is influential. Explain your reasoning. 
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38. Chapter Opener Consider the data from the Chapter Opener on page 483 
on the salaries and average attendances at home games for the teams in 
Major League Baseball. Is the data point (201.4, 45,364) an outlier? If so, is 
it influential? Explain. 


Transformations to Achieve Linearity When a linear model is not 
appropriate for representing data, other models can be used. In some cases, the values 
of x or y must be transformed to find an appropriate model. In a logarithmic 
transformation, the logarithms of the variables are used instead of the original 
variables when creating a scatter plot and calculating the regression line. 


In Exercises 39-42, use the data shown in the table, which shows the number of 
bacteria present after a certain number of hours. 


1 165 39. Find the equation of the regression line for the data. Then construct a scatter 
3 280 plot of (x, y) and sketch the regression line with it. 
3 468 40. Replace each y-value in the table with its logarithm, log y. Find the equation 
4 780 of the regression line for the transformed data. Then construct a scatter plot 
5 1310 of (x, log y) and sketch the regression line with it. What do you notice? 
6 1920 41. An exponential equation is a nonlinear regression equation of the form 
7 4900 y = ab’. Use a technology tool to find and graph the exponential equation 
for the original data. Include a scatter plot in your graph. Note that you 
TABLE FOR EXERCISES 39-42 can also find this model by solving the equation log y = mx + b from 
Exercise 40 for y. 
42. Compare your results in Exercise 41 with the equation of the regression line 
and its graph in Exercise 39. Which equation is a better model for the data? 
Explain. 
Rie In Exercises 43-46, use the data shown in the table. 
1 695 43. Find the equation of the regression line for the data. Then construct a scatter 
+ | a0 plot of (x, y) and sketch the regression line with it. 
3 256 44. Replace each x-value and y-value in the table with its logarithm. Find the 
4 110 equation of the regression line for the transformed data. Then construct a 
scatter plot of (log x, log y) and sketch the regression line with it. What do 
ol) ee you notice? 
6 75 ; b 
71 68 45. A power equation is a nonlinear regression equation of the form y = ax’. 
Use a technology tool to find and graph the power equation for the original 
8 | 74 data. Include a scatter plot in your graph. Note that you can also find this 
TABLE FOR EXERCISES 43-46 model by solving the equation log y = m(log x) + b from Exercise 44 for y. 


46. Compare your results in Exercise 45 with the equation of the regression line 
and its graph in Exercise 43. Which equation is a better model for the data? 
Explain. 


Logarithmic Equation Jn Exercises 47-50, use the following information and 
a technology tool. The logarithmic equation is a nonlinear regression equation of 
the form y =a + bin x. 


47. Find and graph the logarithmic equation for the data given in Exercise 23. 
48. Find and graph the logarithmic equation for the data given in Exercise 24. 


49. Compare your results in Exercise 47 with the equation of the regression line 
and its graph. Which equation is a better model for the data? Explain. 


50. Compare your results in Exercise 48 with the equation of the regression line 
and its graph. Which equation is a better model for the data? Explain. 
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ADEE T 


Regression by Eye 


The regression by eye applet allows you to interactively estimate the regression 
line for a data set. When the applet loads, a data set consisting of 20 points is 
displayed. Points on the plot can be added to the plot by clicking the mouse. 
Points on the plot can be removed by clicking on the point and then dragging the 
point into the trash can. All of the points on the plot can be removed by simply 
clicking inside the trash can. You can move the green line on the plot by clicking 
and dragging the endpoints. You should try to move the line in order to minimize 
the sum of the squares of the residuals, also known as the sum of square error 
(SSE). Note that the regression 
line minimizes SSE. The SSE 
for the green line and for the 


Green line: y = 10.017 + Ox 


Regression line: y = 1.5 + 0.83x 


regression line are given below sat 
the plot. The equations of each 
line are given above the plot. 20 ~ 


Click SHOW REGRESSION 
LINE! to see the regression line 


in the plot. Click NEW DATA to pe aes 
generate a new data set. mi _ x e « ° 
: : 
m Explore *. ie 
Step 1 Move the endpoints ‘0 5 10. 15 209 


Green SSE: 472.20698 


of the green line to try 


to approximate the Regression SSE: 178.7345 


regression line. | See 
Step 2 Click SHOW 

REGRESSION 

LINE!. 


= Draw Conclusions 


1. Click NEW DATA to generate a new data set. Try to move the green line to 
where the regression line should be. Then click SHOW REGRESSION 
LINE!. Repeat this five times. Describe how you moved each green line. 


2. Ona blank plot, place 10 points so that they have a strong positive correlation. 
Record the equation of the regression line. Then, add a point in the upper left 
corner of the plot and record the equation of the regression line. How does the 
regression line change? 


3. Remove the point from the upper-left corner of the plot. Add 10 more points 
so that there is still a strong positive correlation. Record the equation of the 
regression line. Add a point in the upper-left corner of the plot and record the 
equation of the regression line. How does the regression line change? 


4. Use the results of Exercises 2 and 3 to describe what happens to the slope of 
the regression line when an outlier is added as the sample size increases. 


Correlation of Body Measurements 


In a study published in Medicine and 
Science in Sports and Exercise (volume 
17, no. 2, page 189) the measurements of 
252 men (ages 22-81) are given. Of the 
14 measurements taken of each man, 
some have significant correlations and 
others don’t. For instance, the scatter plot 
at the right shows that the hip and 
abdomen circumferences of the men 
have a_ strong linear correlation 
(r = 0.85). The partial table shown here 
lists only the first nine rows of the data. 


Age Weight Height Neck Chest Abdom. Hip 
(yr) (Ib) (in.) (cm) (cm) (cm) (cm) 
22 173.25 | 72.25 | 38.5 93.6 83.0 98.7 
22 154.00 66.25 | 34.0 95.8 87.9 99.2 
23 154.25. 67.75 | 36.2 93.1 85.2 94.5 
23 198.25 73.50 | 42.1 99.6 88.6 104.1 
23 | 159.75 | 72.25 | 35.5 92.1 771 93.9 
23 188.15 77.50 | 38.0 96.6 85.3 102.5 
24 | 184.25 | 71.25 | 34.4 97.3 100.0 | 101.9 
24 210.25 74.75 | 39.0 | 104.5 94.4 | 107.8 
24 156.00 70.75 | 35.7 92.7 81.9 95.3 


Hip and Abdomen Circumferences 


Abdomen circumference (in centimeters) 


>x 


i if i i 
T T T ii T T 

85 90 95 100 105 110 115 
Hip circumference (in centimeters) 


Thigh Knee Ankle Bicep Forearm Wrist Body 


(cm) (cm) (cm) (cm) (cm) (cm) fat % 
587 | 373 | 234 | 305 | 289 | ie2] 64 
59.6 38.9 | 240 | 288 | 252 | 166 | 253 
S00 | 379 | 210 | 320 | S74 | i171 | 128 
63.1 | 41.7 | 250 | 356 | 300 | 192 | 11.7 
56.1 | 36.1 | 227 | 305 | 272 | 182) 94 
591 | 37.6 | 232 | 318 | 29.7 | 183 | 103 
63.2 | 42.2 | 240 | 322 | 27.7 | 17.7 | 28.7 
66.0 | 42.0 | 256 | 357 | 306 | 188 | 20.9 
564 | 36.5 | 220 | 335 | 283 | 173 | 142 


Source: “Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques” by K.W. Penrose et al. (1985). 
MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, vol. 17, no.2, p. 189. 


M™ EXERCISES 


1. Using your intuition, classify the following (x, y) 


3. Use a technology tool to find the regression 


pairs as having a weak correlation (0 <r < 0.5), 
a moderate correlation (0.5 < r < 0.8), or a 
strong correlation (0.8 < r < 1.0). 


(b) (weight, height) 
(d) (chest, hip) 

(f) (ankle, wrist) 
(h) (bicep, forearm) 
(j) (knee, thigh) 
(1) (abdomen, hip) 


(a) (weight, neck) 
(c) (age, body fat) 
(e) (age, wrist) 

(g) (forearm, height) 
(i) (weight, body fat) 
(k) (hip, abdomen) 


. Now, use a technology tool to find the 
correlation coefficient for each pair in 
Exercise 1. Compare your results with those 
obtained by intuition. 


line for each pair in Exercise 1 that has a 
strong correlation. 


. Use the results of Exercise 3 to predict the 


following. 
(a) The neck circumference of a man whose 
weight is 180 pounds 


(b) The abdomen circumference of a man 
whose hip circumference is 100 centimeters 


. Are there pairs of measurements that have 


stronger correlation coefficients than 0.85? 
Use a technology tool and intuition to reach a 
conclusion. 
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Measures of Regression and Prediction Intervals 


WHAT YOU SHOULD LEARN Variation about a Regression Line > The Coefficient of Determination 
>» The Standard Error of Estimate > Prediction Intervals 
How to interpret the three 


types of variation about a >» VARIATION ABOUT A REGRESSION LINE 


regression line In this section, you will study two measures used in correlation and regression 
studies—the coefficient of determination and the standard error of estimate. You 
will also learn how to construct a prediction interval for y using a regression line 
and a given value of x. Before studying these concepts, you need to understand 


» A 


ve 


How to find and interpret the 
coefficient of determination 


> How to find and interpret the the three types of variation about a regression line. 
standard error of estimate for To find the total variation, the explained variation, and the unexplained 
a regression line variation about a regression line, you must first calculate the total deviation, the 
R xplain iation, and the unexplain iation for each ordered pair (.x,, y; 
> How to construct and interpret : ae . . 4 a hans ae see eae h pachrosdioned (pains) 
a prediction interval for y in a data set. These deviations are shown in the graph. 
y 
A 
nae = (x;, y;) Unexplained 
Total deviation = y, — y Total “ -) deviation 
Explained deviation = 9; — y deviation Via 
, ee n yj-) “sy 7 
Unexplained deviation = y, — J; —— Le D_ oo 
= eviation 
1 (x, ¥) ua ae 
I bee 
4 >x 
x 


After calculating the deviations for each data point (x;, y;), you can find the total 
variation, the explained variation, and the unexplained variation. 


STUDY TIP 

products and carbon dioxide DEFINITION 

emissions data used throughout this The total variation about a regression line is the sum of the squares of the 
chapter with a regression line of differences between the y-value of each ordered pair and the mean of y. 


y = 196.152x + 102.289. 


Using the data point (2.7, 571.8), 
you can find the total, explained, 
and unexplained deviations as 
follows. Explained variation = } (J; — y)? 
Total deviation: 


Total variation = > (y;, — y)? 


The explained variation is the sum of the squares of the differences between 
each predicted y-value and the mean of y. 


The unexplained variation is the sum of the squares of the differences between 
yi — Y = 571.8 — 555.4 the y-value of each ordered pair and each corresponding predicted y-value. 
= 16.4 


Explained deviation: 

yi — Y = 631.8994 — 555.4 
= 76.4994 

Unexplained deviation: 

yj; — Y; = 571.8 — 631.8994 
= —60.0994 


Unexplained variation = > (y,; — 3;)? 


The sum of the explained and unexplained variations is equal to the total 
variation. 


Total variation = Explained variation + Unexplained variation 


As its name implies, the explained variation can be explained by the 
relationship between x and y. The unexplained variation cannot be explained by 
the relationship between x and y and is due to chance or other variables. 
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Janette Benson (Psychology 
Department, University of 
Denver) performed a study 
relating the age at which infants 
crawl (in weeks after birth) with 
the average monthly temperature 
six months after birth. Her 

results are based on a sample 

of 414 infants. Janette Benson 
believes that the reason for the 
correlation of temperature and 
crawling age is that parents 

tend to bundle infants in more 
restrictive clothing and blankets 
during cold months. This 
bundling doesn’t allow the infant 
as much opportunity to move and 
experiment with crawling. 


Crawling age (in weeks) 


SSSI 
35 45 55 5 
Temperature (in °F) 


The correlation coefficient is 

r = -0.70. What percent of the 
variation in the data can be 
explained? What percent is 
due to chance, sampling error, 
or other factors? 
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> THE COEFFICIENT OF DETERMINATION 


You already know how to calculate the correlation coefficient r. The square of 
this coefficient is called the coefficient of determination. It can be shown that the 
coefficient of determination is equal to the ratio of the explained variation to the 
total variation. 


DEFINITION 


The coefficient of determination r? is the ratio of the explained variation to 
the total variation. That is, 


2 Explained variation 


r ee 
Total variation 


It is important that you interpret the coefficient of determination correctly. 
For instance, if the correlation coefficient is r = 0.90, then the coefficient of 
determination is 


r? = 0.90" 
= 0.81. 
This means that 81% of the variation in y can be explained by the relationship 


between x and y. The remaining 19% of the variation is unexplained and is due 
to other factors or to sampling error. 


EXAMPLE 1 G® Report 41 


» Finding the Coefficient of Determination 


The correlation coefficient for the gross domestic products and carbon dioxide 
emissions data as calculated in Example 4 in Section 9.1 is r ~ 0.882. Find 
the coefficient of determination. What does this tell you about the explained 
variation of the data about the regression line? About the unexplained 
variation? 


> Solution 
The coefficient of determination is 
r? = (0.882)? 
= 0.778. 


Interpretation About 77.8% of the variation in the carbon dioxide emissions 
can be explained by the variation in the gross domestic products. About 22.2% 
of the variation is unexplained and is due to chance or other variables. 


> Try It Yourself 1 


The correlation coefficient for the Old Faithful data as calculated in Example 
5 in Section 9.1 is r ~ 0.979. Find the coefficient of determination. What does 
this tell you about the explained variation of the data about the regression 
line? About the unexplained variation? 


a. Identify the correlation coefficient r. 

b. Calculate the coefficient of determination r’. 

c. What percent of the variation in the times is explained? What percent is 
unexplained? Answer: Page A45 
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10.0 70.0 
10.5 71.0 

9.5 70.0 
11.0 72.0 
12.0 74.0 

8.5 66.0 

9.0 68.5 
13.0 76.0 
10.5 71.5 
10.5 70.5 
10.0 72.0 

9:3 70.0 
10.0 71.0 
10.5 69.5 
11.0 71.5 
12.0 73.5 
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> THE STANDARD ERROR OF ESTIMATE 


When a j-value is predicted from an x-value, the prediction is a point estimate. 
You can construct an interval estimate for y, but first you need to calculate the 
standard error of estimate. 


DEFINITION 


The standard error of estimate s, is the standard deviation of the observed 
y,-values about the predicted y-value for a given x;-value. It is given by 


ZV Say) 


: i = 2 


where nv is the number of ordered pairs in the data set. 


From this formula, you can see that the standard error of estimate is the 
square root of the unexplained variation divided by n — 2. So, the closer the 
observed y-values are to the predicted y-values, the smaller the standard error of 
estimate will be. 


GUIDELINES 


Finding the Standard Error of Estimate se 


IN WORDS IN SYMBOLS 

1. Make a table that includes the column Liev yy ae 
headings shown at the right. (y — ¥,)? 

2. Use the regression equation to calculate y = mx, +b 


the predicted y-values. 


3. Calculate the sum of the squares of the =(y% - j)* 
differences between each observed y-value 
and the corresponding predicted y-value. 


4. Find the standard error of estimate. So = 


You can also find the standard error of estimate using the following formula. 


_ {oy = bly ma xy 
n—-2 


Se 


This formula is easy to use if you have already calculated the slope m, the 
y-intercept b, and several of the sums. For instance, the regression line for the 
data set given at the left is » = 1.84247x + 51.77413, and the values of the sums 
are Sy* = 80,877.5, Sy = 1137, and Sxy = 11,940.25. When the alternative 
formula is used, the standard error of estimate is 


— i — bly maaxy 
n—-2 
ae — 51.77413(1137) — 1.84247(11,940.25) 
7 16-2 


= 0.877. 
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CORRELATION AND REGRESSION 


EXAMPLE 2 GD Report 42 


> Finding the Standard Error of Estimate 
The regression equation for the gross domestic products and carbon dioxide 
emissions data as calculated in Example 1 in Section 9.2 is 

y = 196.152x + 102.289. 


Find the standard error of estimate. 


> Solution 


Use a table to calculate the sum of the squared differences of each observed 
y-value and the corresponding predicted y-value. 


1.6 428.2 416.1322 12.0678 145.63179684 
3.6 828.8 808.4362 20.3638 414.68435044 
4.9 1214.2  1063.4338 150.7662 22,730.44706244 
11 444.6 318.0562 126.5438 16,013.33331844 


0.9 264.0 278.8258 — 14.8258 219.80434564 
2.9 415.3 671.1298 —255.8298 65,448.88656804 
21 571.8 631.8994 — 60.0994 3611.93788036 
2.3 454.9 553.4386 —98.5386 9709.85568996 
1.6 358.7 416.1322 —57.4322 3298.45759684 
1.5 573.5 396.517 176.983 31,322.982289 


y= ——> 
Unexplained variation 
When n = 10 and >(y; — 3,;)” = 152,916.020898 are used, the standard error 
of estimate is 


(yi — 34)? 
n—-2 


/152,916.020898 
7 10 —2 


~ 138.255. 


So = 


Interpretation The standard error of estimate of the carbon dioxide emissions 
for a specific gross domestic product is about 138.255 million metric tons. 


> Try It Yourself 2 


A researcher collects the data shown at the left and concludes that there is 
a significant relationship between the amount of radio advertising time (in 
minutes per week) and the weekly sales of a product (in hundreds of dollars). 
Find the standard error of estimate. Use the regression equation 


y = 1.405x + 7.311. 


a. Use a table to calculate the sum of the squared differences of each observed 
y-value and the corresponding predicted y-value. 

b. Identify the number n of ordered pairs in the data set. 

ce. Calculate s,. 

d. Interpret the results. Answer: Page A45 
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> PREDICTION INTERVALS 


Two variables have a bivariate normal distribution if for any fixed values of x the 
corresponding values of y are normally distributed, and for any fixed values of y 
the corresponding values of x are normally distributed. 


Bivariate Normal Distribution 


Because regression equations are determined using sample data and because 
x and y are assumed to have a bivariate normal distribution, you can construct a 
prediction interval for the true value of y. To construct the prediction interval, use 
a t-distribution with n — 2 degrees of freedom. 


DEFINITION 


Given a linear regression equation y = mx + b and Xo, a specific value of x, 
a c-prediction interval for y is 


y-E<y<ytE 
where 
aw 
1 al aay = 3 
Ex tsafl 4 aF ee ) a 
BB (pss = (2,28) 


The point estimate is y and the margin of error is E. The probability that the 
prediction interval contains y is c. 


GUIDELINES 


Construct a Prediction Interval for y for a Specific Value of x 


IN WORDS IN SYMBOLS 
1. Identify the number of ordered Chi, = m= 2 
STUDY TIP pairs in the data set n and the 
The formulas for s. and degrees of freedom. 
E use the quantities 2. Use the regression equation y, = mx; + b 


Dyes OL 
and > x?. Use a table 
to calculate these 
quantities. 


and the given x-value to find 
the point estimate y. 

3. Find the critical value ¢, that 
corresponds to the given level 
of confidence c. 


Use Table 5 in Appendix B. 


ey 
4. Find the standard error of S> = 2(yi = ¥i) 
estimate s,. n—2 
3 é il n( Xo = ze 
5. Find the margin of error E. fg =iSa} ie Ss 
a nde —(>x) 


6. Find the left and right 
endpoints and form the 
prediction interval. 
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Left endpoint: y — E 
Right endpoint: y + E 
Interval: y—- Ex y<yteE 
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EXAMPLE 3 G@® Report 43 


> Constructing a Prediction Interval 


Using the results of Example 2, construct a 95% prediction interval for the 
carbon dioxide emissions when the gross domestic product is $3.5 trillion. 
What can you conclude? 


> Solution 
Because n = 10, there are 
10-2=8 


degrees of freedom. Using the regression equation 
y = 196.152x + 102.289 
and 
x = 35, 
the point estimate is 
y = 196.152x + 102.289 
= 196.152(3.5) + 102.289 
= 788.821. 


From Table 5, the critical value is ¢, = 2.306, and from Example 2, 
S. © 138.255. Using these values, the margin of error is 


=)\2 
n(xy — X 
Extsnfi+ts Mi ) 7 
n n(x-) — (Sx) 
10(3.5 — 2.31)? 
fw (2306)(138.255),| re rem ) 
10  10(67.35) — (23.1)? 
dpe etull = 349.424. 
The greater the difference oo, ee 
between x and x, the wider the Using y = 788.821 and E = 349.424, the prediction interval is 


prediction interval is. For instance, 
in Example 3 the 95% prediction 
intervals for 0.9 < x < 4.9 are 
shown below. Notice how the 
bands curve away from the 
regression line as x gets close 

to 0.9 and 4.9. 


Left Endpoint Right Endpoint 
788.821 — 349.424 = 439.397 788.821 + 349.424 = 1138.245 


439.397 < y < 1138.245. 


Interpretation You can be 95% confident that when the gross domestic 
product is $3.5 trillion, the carbon dioxide emissions will be between 439.397 
95% prediction and 1138.245 million metric tons. 


+ interval 
Ea lesen aces > Try It Yourself 3 
Construct a 95% prediction interval for the carbon dioxide emissions when the 
gross domestic product is $4 trillion. What can you conclude? 
a. Specify n, df, t., Se. 
b. Calculate y when x = 4. 
c. Calculate the margin of error E. 
d 
e 


CO emissions 
(in millions of metric tons) 
i 
So 
t=] 
t 


AB AS 
GDP (in trillions of dollars) . Construct the prediction interval. 


. Interpret the results. Answer: Page A45 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


Graphical Analysis Jn Exercises 1-3, use the graph to answer the question. 


y 
A 


(x; Ye 


| 


> x 


x 
1. Describe the total variation about a regression line in words and in symbols. 


2. Describe the explained variation about a regression line in words and in 
symbols. 


3. Describe the unexplained variation about a regression line in words and in 
symbols. 
4, The coefficient of determination r? is the ratio of which two types of 


variations? What does r* measure? What does 1 — r? measure? 


5. What is the coefficient of determination for two variables that have perfect 
positive linear correlation or perfect negative linear correlation? Interpret 
your answer. 


6. Two variables have a bivariate normal distribution. Explain what this means. 


In Exercises 7-10, use the value of the linear correlation coefficient to calculate the 
coefficient of determination. What does this tell you about the explained variation 
of the data about the regression line? About the unexplained variation? 


7. r = 0.465 8. r = —0.328 
9, r = —0.957 10. 7 = 0.881 


M@ USING AND INTERPRETING CONCEPTS 


Finding Types of Variation and the Coefficient of Determination 
In Exercises 11-18, use the data to find (a) the coefficient of determination and 
interpret the result, and (b) the standard error of estimate s, and interpret the result. 


* 11. Stock Offerings The number of initial public offerings of stock issued 
in a recent 12-year period and the total proceeds of these offerings (in 
millions of U.S. dollars) are shown in the table. The equation of the 
regression line is y = 104.982x + 14,128.671. (Source: University of Florida) 


318 486 382 79 70 67 


34,614 64,927 65,088 34,241 22,136 10,068 


184 168 162 162 21 43 
32,269 28,593 30,648 35,762 22,762 13,307 
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12. Crude Oil The table shows the amounts of crude oil (in thousands of 


e y 
al 


‘, 14. Work and Leisure Time The median number of work hours per week 


and the median number of leisure hours per week for people in the 
United States for 10 recent years are shown in the table. The equation of 
the regression line is y = —0.646x + 50.734. (Source: Louis Harris & 
Associates) 


15. 


barrels per day) produced by the United States and the amounts of crude oil 
(in thousands of barrels per day) imported by the United States for seven 
years. The equation of the regression line is y = —2.735x + 27,657.823. 
(Source: Energy Information Administration) 


5801 5746 5681 5419 5178 5102 5064 


11,871 | 11,530 12,264 $13,145 | 13,714 13,707 13,468 


13. Retail Space and Sales The table shows the total square footage (in 
billions) of retailing space at shopping centers and their sales (in billions 
of US. dollars) for 11 years. The equation of the regression line is 
y = 549.448x — 1881.694. (Adapted from International Council of Shopping 
Centers) 


5.7 5.8 5.9 6.0 6.1 


so |-si | 5a | 5a | 35 | 56 | 
893.8 933.9 980.0 1032.4 11053 1181.1 


1221.7 | 1277.2 | 1339.2 1432.6 | 1530.4 


43.1 46.9 | 47.3 | 46.8 


24.3 19.2 18.1 | 16.6 


50.0 50.7. 50.6 50.8 


18.8 19.5 19.2 19.5 


State and Federal Government Wages The table shows the average weekly 
wages for state government employees and federal government employees 
for six years. The equation of the regression line is y = 1.900x — 411.976. 
(Source: U.S. Bureau of Labor Statistics) 


754 770 791 812 844 8883 


1001) 1043, 1111) 1151) 1198 1248 
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16. 


Voter Turnout The U.S. voting age population (in millions) and the turnout 
of the voting age population (in millions) for federal elections for eight 
nonpresidential election years are shown in the table. The equation of the 
regression line is y = 0.333x + 7.580. (Adapted from Federal Election 
Commission) 


158.4 169.9 | 178.6 185.8 
58.9 67.6 65.0 67.9 


193.7 200.9 | 215.5 220.6 
75.1 73.1 79.8 80.6 


. Taxes The table shows the gross collections (in billions of dollars) of 


individual income taxes and corporate income taxes by the U.S. Internal 
Revenue Service for seven years. The equation of the regression line is 
y = 0.415x — 186.626. (Source: Internal Revenue Service) 


1038 987-990 | 1108 | 1236-1366 1426 


211 194 231-307 381 396 354 


. Fund Assets The table shows the total assets (in billions of U.S. dollars) of 


individual retirement accounts (IRAs) and federal pension plans for nine 
years. The equation of the regression line is y = 0.174x + 432.225. (Source: 
Investment Company Institute) 


2629 2619 = 2533 2993 3299 
797 860 894 958 | 1023 


3652 4220 | 4736 | 3572 
1072) 1141 | 1197. 1221 


Constructing and Interpreting Prediction Intervals =n Exercises 19-26, 
construct the indicated prediction interval and interpret the results. 


19. 


20. 


Proceeds Construct a 95% prediction interval for the proceeds from initial 
public offerings in Exercise 11 when the number of issues is 450. 


Crude Oil Construct a 95% prediction interval for the amount of crude oil 
imported by the United States in Exercise 12 when the amount of crude oil 
(in thousands of barrels per day) produced by the United States is 5500. 


- Retail Sales Using the results of Exercise 13, construct a 90% prediction 


interval for shopping center sales when the total square footage of shopping 
centers is 5.75 billion. 


. Leisure Hours Using the results of Exercise 14, construct a 90% prediction 


interval for the median number of leisure hours per week when the median 
number of work hours per week is 45.1. 


. Federal Government Wages When the average weekly wages of state 


government employees is $800, find a 99% prediction interval for the average 
weekly wages of federal government employees. Use the results of Exercise 15. 
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24. Predicting Voter Turnout When the voting age population is 210 million, 
construct a 99% prediction interval for the voter turnout in federal elections. 
Use the results of Exercise 16. 


25. Taxes The U.S. Internal Revenue Service collects $1250 billion in individual 
income taxes for a given year. Construct a prediction interval for the 
corporate income taxes collected by the U.S. Internal Revenue Service. Use 
the results of Exercise 17 and c = 0.95. 


26. Total Assets The total assets in IRAs is $3800 billion. Construct a 
prediction interval for the total assets in federal pension plans. Use the 
results of Exercise 18 and c = 0.90. 


Old Vehicles Jn Exercises 27-33, use the information shown at the left. 


27. Scatter Plot Construct a scatter plot of the data. Show y and X on the graph. 


Keeping cars longer 


The median age of vehicles on U.S. 28. Regression Line Find and graph the regression line. 
roads for eight different years: 


29. Deviation Calculate the explained deviation, the unexplained deviation, 
and the total deviation for each data point. 


30. Variation Find (a) the explained variation, (b) the unexplained variation, 
and (c) the total variation. 


31. Coefficient of Determination Find the coefficient of determination. What 
can you conclude? 


32. Error of Estimate Find the standard error of estimate s, and interpret the 


results. 
(pource: "Polk Co:) 33. Prediction Interval Construct and interpret a 95% prediction interval for 
FIGURE FOR EXERCISES 27-33 the median age of trucks in use when the median age of cars in use is 7.0 years. 


34. Correlation Coefficient and Slope Recall the formula for the correlation 
coefficient r and the formula for the slope m of a regression line. Given a set 
of data, why must the slope m of the data’s regression line always have the 
same sign as the data’s correlation coefficient r? 


In Exercises 35 and 36, use StatCrunch and the given data to (a) find 
the coefficient of determination, (b) find the standard error of estimate s,, and (c) 
construct a 95% prediction interval for y using the given value of x. 


35. Trees The table shows the heights (in feet) and trunk diameters (in inches) 
of eight trees. The equation of the regression line is y = 0.479x — 24.086. 


x = 80 feet 


72 75 76 85 78 77 82 


105 110 114 149 140 163 15.8 


36. Motor Vehicles The table shows the number of motor vehicle registrations 
(in millions) and the number of motor vehicle accidents (in millions) in the 
United States for six years. The equation of the regression line is 
y = —0.314x + 87.116. 


x = 235 million registrations 


_ Registrations, x 229.6 | 231.4 237.2 241.2 244.2 9 247.3 


18.3 11.8 10.9 10.7 10.4 10.6 


(Adapted from U.S. Federal Highway Administration and National Safety Council) 
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M@ EXTENDING CONCEPTS 


Hypothesis Testing for Slope Jn Exercises 37 and 38, use the following 
information. 


When testing the slope M of the regression line for the population, you usually test 
that the slope is 0, or Hj:M = 0. A slope of 0 indicates that there is no linear 
relationship between x and y. To perform the t-test for the slope M, use the 
standardized test statistic 


ui (2x)? 


—— 2 
Se ax 


n 


with n — 2 degrees of freedom. Then, using the critical values found in Table 5 in 
Appendix B, make a decision whether to reject or fail to reject the null hypothesis. 
You can also use the LinRegTTest feature on a TI-83/84 Plus to calculate the 
standardized test statistic as well as the corresponding P-value. If P = a, then reject 
the null hypothesis. If P > a, then do not reject Hp. 


37. The following table shows the weights (in pounds) and the number of hours 
slept in a day by a random sample of infants. Test the claim that M # 0. 
Use a = 0.01. Then interpret the results in the context of the problem. If 
convenient, use technology to solve the problem. 


8.1 / 10.2} 99 | 7.2 | 69 | 11.2} 11 se) 


"Hours slept, y 148 146 141 142 13.8 13.2 13.9 12.5 


© 38. The following table shows the ages (in years) and salaries (in thousands 
/ of dollars) of a random sample of engineers at a company. Test the claim 
that M # 0. Use a = 0.05. Then interpret the results in the context of 

the problem. If convenient, use technology to solve the problem. 


25 34 29 30. 42 38 49 52 35 40 


“Salaryy 57.5 61.2 | 59.9 | 58.7 | 87.5 | 67.4 | 89.2 | 85.3 | 69.5 | 75.1 


Confidence Intervals for y-Intercept and Slope Yow can construct 
confidence intervals for the y-intercept B and slope M of the regression line 
y = Mx + B for the population by using the following inequalities. 


y-interceptB:s b-E<B<bt+E 


where E = tS. : + 2 5 and 
n Se (x) 
n 
slopeM: m-E<M<m+eE 
ti 
where E = ae 
re _ (ay 


The values of m and b are obtained from the sample data, and the critical value t, 
is found using Table 5 in Appendix B with n — 2 degrees of freedom. 


In Exercises 39 and 40, construct the indicated confidence interval for B and M using 
the gross domestic products and carbon dioxide emissions data found in Example 2. 


39. 95% confidence interval 40. 99% confidence interval 
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Multiple Regression 


WHAT YOU SHOULD LEARN Finding a Multiple Regression Equation > Predicting y-Values 


» How to use technology to 
find a multiple regression 
equation, the standard 
error of estimate, and the 
coefficient of determination 


» How to use a multiple 
regression equation to predict 
y-values 


INSIGHT 


Because the mathematics 
associated with multiple 
regression is complicated, 
this section focuses 
on how to use 
technology to find 

a multiple regression 
equation and how to 
interpret the results. 


> FINDING A MULTIPLE REGRESSION EQUATION 


In many instances, a better prediction model can be found for a dependent 
(response) variable by using more than one independent (explanatory) variable. 
For instance, a more accurate prediction for the carbon dioxide emissions 
discussed in previous sections might be made by considering the number of cars 
as well as the gross domestic product. Models that contain more than one 
independent variable are multiple regression models. 


DEFINITION 


A multiple regression equation has the form 
y = b+ myx, + myx. + max3 +--+ + mygXx, 


where x1, X2, X3,..., X, are the independent variables, b is the y-intercept, 
and y is the dependent variable. 


The y-intercept b is the value of y when all x; are 0. Each coefficient m; is the 
amount of change in y when the independent variable x; is changed by one unit 
and all other independent variables are held constant. 


EXAMPLE 1 G@ Report 44 


» Finding a Multiple Regression Equation 

A researcher wants to determine how employee salaries at a certain company 
are related to the length of employment, previous experience, and education. 
The researcher selects eight employees from the company and obtains the 
following data. 


A 57,310 10 2 16 
B 57,380 5 6 16 
C 54,135 1 12 
D 56,985 6 5 14 
E 58,715 8 8 16 
F 60,620 20 0 12 
G 59,200 8 4 18 
H 60,320 14 6 17 


Use MINITAB to find a multiple regression equation that models the data. 
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> Solution 
Enter the y-values in Cl and the x,-, x2-, and x3-values in C2, C3, and C4, 
respectively. Select “Regression Regression ...” from the Stat menu. Using 


the salaries as the response variable and the remaining data as the predictors, 
you should obtain results similar to the following. 


In Example 1, it is important that Regression Analysis: Salary, y versus x1, x2, x3 

you interpret the coefficients m,, The regression equation is 

mz, and m3 correctly. For instance, Salary, y = 49764 + 364 x1 + 228 x2 + 267 x3 

if x2 and x3 are held constant and 

X, increases by 1, then y increases Predictor Coef SE Coef T P 
by $364. Similarly, if x; and Constant 1981 25.12 0.000 
x3 are held constant and x, x1 (364.41)-"" 4.32 7.54 0.002 
increases by 1, then y x2 ( 227.6 )—m, 123.8 1.84 0.140 
increases by $228. If x, x3 We 147.4 1.81 0.144 


and x2 are held constant 
and x3 increases by 1, 
then y increases 

by $267. 


S=659490 RSq=944% RSafadj) = 90.2% 


The regression equation is y = 49,764 + 364x, + 228x. + 267x3. 
> Try It Yourself 1 


A statistics professor wants to determine how students’ final grades are related 
to the midterm exam grades and number of classes missed. The professor 
selects 10 students from her class and obtains the following data. 


75 


1 81 1 
2 90 80 0 
3 86 91 2 
4 76 80 3 
5 51 62 6 
6 75 90 4 
7 44 60 7 
8 81 82 2 
9 94 88 0 
10 93 96 1 


Use technology to find a multiple regression equation that models the data. 


a. Enter the data. 
b. Calculate the regression line. Answer: Page A45 


MINITAB displays much more than the regression equation and the 
coefficients of the independent variables. For instance, it also displays the 
standard error of estimate, denoted by S, and the coefficient of determination, 
denoted by R-Sqg. In Example 1, S = 659.490 and R-Sq = 94.4%. So, the 
standard error of estimate is $659.49. The coefficient of determination tells you 
that 94.4% of the variation in y can be explained by the multiple regression 
model. The remaining 5.6% is unexplained and is due to other factors or chance. 
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In a lake in Finland, 159 fish 
of 7 species were caught and 
measured for weight G (in grams), 
length L (in centimeters), height 
H, and width W (H and W are 
percents of L). The regression 
equation for G and L is 

G = —491 + 28.5L, 

be 0.9257? = 0855. 
When all four variables are used, 
the regression equation is 

Gr eee Ors ee 

1.46H + 13.3W, 
r = 0.930, r? ~ 0.865. 


(Source: Journal of Statistics Education) 


Predict the weight of a fish with 
the following measurements: 

L = 40, H = 17, andW = 11. 
How do your predictions vary 
when you use a single variable 
versus many variables? Which 
do you think is more accurate? 


> PREDICTING y-VALUES 


After finding the equation of the multiple regression line, you can use the 
equation to predict y-values over the range of the data. To predict y-values, 
substitute the given value for each independent variable into the equation, then 
calculate y. 


EXAMPLE 2 


> Predicting y-Values Using Multiple Regression Equations 


Use the regression equation found in Example 1 to predict an employee’s 
salary given the following conditions. 


1. 12 years of current employment, 5 years of experience, and 16 years of 
education 


2. 4 years of current employment, 2 years of experience, and 12 years of 
education 


3. 8 years of current employment, 7 years of experience, and 17 years of 
education 


> Solution 


To predict each employee’s salary, substitute the values for x,, x, and x3 into 
the regression equation. Then calculate y. 


1. y = 49,764 + 364x, + 228x2 + 267x3 
= 49,764 + 364(12) + 228(5) + 267(16) 
= 59,544 
The employee’s predicted salary is $59,544. 
2. y = 49,764 + 364x, + 228x. + 267x3 
= 49,764 + 364(4) + 228(2) + 267(12) 
= 54,880 
The employee’s predicted salary is $54,880. 
3. y = 49,764 + 364x, + 228x. + 267x3 
= 49,764 + 364(8) + 228(7) + 267(17) 
= 58,811 
The employee’s predicted salary is $58,811. 


> Try It Yourself 2 


Use the regression equation found in Try It Yourself 1 to predict a student’s 
final grade given the following conditions. 


1. A student has a midterm exam score of 89 and misses 1| class. 
2. A student has a midterm exam score of 78 and misses 3 classes. 
3. A student has a midterm exam score of 83 and misses 2 classes. 


. Substitute the midterm score for x, into the regression equation. 

. Substitute the corresponding number of missed classes for x, into the 
regression equation. 

. Calculate y. 

d. What is each student’s final grade? Answer: Page A45 


a 2 


ie) 
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DZD Exercises 


M@ BUILDING BASIC SKILLS AND VOCABULARY 


Predicting y-Values 9 Jn Exercises 1-4, use the multiple regression equation to 
predict the y-values for the given values of the independent variables. 


FOR EXTRA HELP; 


: y} 


1. Potato Yield The equation used to predict the annual potato yield (in 
pounds per acre) is 


> = 61,298 + 57.56x, — 78.45x, 


where x, is the number of acres planted (in thousands) and x, is the number 
of acres harvested (in thousands). (Adapted from United States Department of 


Agriculture) 
(a) x; = 1100, x. = 1090 (b) x; = 1060, x2 = 1050 
(c) x, = 1300, x. = 1250 (d) x, = 1140, x. = 1120 


2. Rye Yield The equation used to predict the annual rye yield (in bushels per 
acre) is 


§ = 22 — 0.027x, + 0.156x, 


where x, is the number of acres planted (in thousands) and x is the 
number of acres harvested (in thousands). (Source: United States Department of 


Agriculture) 
(a) xy = 1250, x = 250 (b) x, = 1400, x2 = 275 
(c) x, = 1425, x. = 300 (d) x; = 1300, x2 = 250 


3. Black Cherry Tree Volume The volume (in cubic feet) of a black cherry tree 
can be modeled by the equation 


y = -52.2 + 0.3x, + 45x, 


where x, is the tree’s height (in feet) and x, is the tree’s diameter (in inches). 
(Source: Journal of the Royal Statistical Society) 


(a) xy = 70, x= 8.6 (b) xy = 65, x2 >= 11.0 
(c) x; = 83, x = 17.6 (d) x; = 87, x. = 19.6 


4. Elephant Weight The equation used to predict the weight of an elephant 
(in kilograms) is 


y = —4016 + 11.5x, + 7.55x2 + 12.5x3 


where x, represents the girth of the elephant (in centimeters), x. represents 
the length of the elephant (in centimeters), and x3 represents the circumference 
of a footpad (in centimeters). (Source: Field Trip Earth) 


(a) x, = 421, x) = 224, x3 = 144 (b) x; = 311, x. = 171, x3 = 102 
(c) x, = 376, x2 = 226, x3 = 124 (d) x; = 231, x2 = 135, x3 = 86 


M@ USING AND INTERPRETING CONCEPTS 


Finding a Multiple Regression Equation Jn Exercises 5 and 6, use 
technology to find the multiple regression equation for the data shown in the table. 
Then answer the following and interpret the results. 


(a) What is the standard error of estimate? 
(b) What is the coefficient of determination? 
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"5. Sales The total square footage (in billions) of retailing space at shopping 
centers, the number (in thousands) of shopping centers, and the sales (in 

billions of dollars) for shopping centers for a recent 11-year period are 

shown in the table. (Adapted from International Council of Shopping Centers) 


893.8 5.0 41.2 
933.9 5.1 42.1 
980.0 5.2 43.0 
1032.4 5.3 43.7 
1105.3 5.5 44.4 
1181.1 5.6 45.1 
1221.7 a7 45.8 
1277.2 5.8 46.4 
1339.2 5.9 47.1 
1432.6 6.0 47.8 
1530.4 6.1 48.7 


" 6. Shareholder’s Equity The following table shows the net sales (in billions 

of dollars), total assets (in billions of dollars), and shareholder’s equity (in 
billions of dollars) for Wal-Mart for a recent five-year period. (Adapted 
from Wal-Mart Stores, Inc.) 


53.2 308.9 138.8 
61.6 344.8 151.6 
64.6 373.8 163.5 
65.3 401.1 163.4 
70.7 405.0 170.7 


7. Use StatCrunch to find the multiple regression equation for the data 
in Exercise 5. Compare this result with the equation found in Exercise 5. 


8. Use StatCrunch to find the multiple regression equation for the data 
in Exercise 6. Compare this result with the equation found in Exercise 6. 


M@ EXTENDING CONCEPTS 


Adjusted r? The calculation of r, the coefficient of determination, depends on 
the number of data pairs and the number of independent variables. An adjusted 
value of r? can be calculated, based on the number of degrees of freedom, 
as follows. 


(1 - r°)\(n ~ 1) 
n-k-I 


De) sea 
Yadj 1l- 


where n is the number of data pairs and k is the number of independent variables. 


In Exercises 9 and 10, after calculating radi determine the percentage of the 
variation in y that can be explained by the relationships between variables 
according to adj: Compare this result with the one obtained using r?. 


9. Calculate rj; for the data in Exercise 5. 


10. Calculate adj for the data in Exercise 6. 


Presented by: https://jafrilibrary.org 


USES AND ABUSES 


Uses 


Correlation and Regression Correlation and regression analysis can be used 
to determine whether there is a significant relationship between two variables. 
If there is, you can use one of the variables to predict the value of the other 
variable. For instance, educators have used correlation and regression analysis 
to determine that there is a significant correlation between a student’s SAT 
score and the grade point average from a student’s freshman year at college. 
Consequently, many colleges and universities use SAT scores of high school 
applicants as a predictor of the applicant’s initial success at college. 


Abuses 


Confusing Correlation and Causation The most common abuse of 
correlation in studies is to confuse the concepts of correlation with those of 
causation (see page 494). Good SAT scores do not cause good college grades. 
Rather, there are other variables, such as good study habits and motivation, 
that contribute to both. When a strong correlation is found between two 
variables, look for other variables that are correlated with both. 


Considering Only Linear Correlation ‘The correlation studied in this 
chapter is linear correlation. When the correlation coefficient is close to 1 or 
close to —1, the data points can be modeled by a straight line. It is possible that 
a correlation coefficient is close to 0 but there is still a strong correlation of a 
x¥)}1)0j)-17] 0 different type. Consider the data listed in the table at the left. The value of the 
; correlation coefficient is 0; however, the data are perfectly correlated with the 
ied equation x” + y? = 1, as shown in the graph. 


n Ethics 
: When data are collected, all of the data should be used when calculating 
lf statistics. In this chapter, you learned that before finding the equation of a 
5 x = regression line, it is helpful to construct a scatter plot of the data to check for 
Ste al Le outliers, gaps, and clusters in the data. Researchers cannot use only those data 
al points that fit their hypotheses or those that show a significant correlation. 
=p Although eliminating outliers may help a data set coincide with predicted 


patterns or fit a regression line, it is unethical to amend data in such a way. An 
outlier or any other point that influences a regression model can be removed 
only if it is properly justified. 

In most cases, the best and sometimes safest approach for presenting 
statistical measurements is with and without an outlier being included. By 
doing this, the decision as to whether or not to recognize the outlier is left to 
the reader. 


Mi EXERCISES 


1. Confusing Correlation and Causation Find an example of an article that 
confuses correlation and causation. Discuss other variables that could 
contribute to the relationship between the variables. 


2. Considering Only Linear Correlation Find an example of two real-life 
variables that have a nonlinear correlation. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


530 CHAPTER 9 CORRELATION AND REGRESSION 


“)) CHAPTER SUMMARY 


REVIEW 
What did you learn? EXAMPLE(S) | EXERCISES 
Section 9.1 
= How to construct a scatter plot 1-3 1-4 
= How to find a correlation coefficient 4,5 1-4 


_ naxy — (2x)(Zy) 
Vndx -— (DxPVndy? - (Dy)? 


r 


= How to perform a hypothesis test for a population correlation coefficient p 7 5-10 
r 
— 
1-r 
n=2 
Section 9.2 
= How to find the equation of a regression line, y = mx + b 1,2 11-14 
n&xy — (2x)(2y) 
ndx? -— (Ixy 
b=y- mx 
Ly De 
SS = I 
n n 
= How to predict y-values using a regression equation 3 15-18 
Section 9.3 
= How to find and interpret the coefficient of determination r? d 19-24 
= How to find and interpret the standard error of estimate for a regression line 2 23, 24 
_ qo — 9)? _ _— — bY y — m>xy 
Se 
n= 2 n—-2 

= How to construct and interpret a prediction interval for y,y -E<y<y+E 3 25-30 


2 


E= tse pee! 
n n>x? - (x)? 


Section 9.4 


= How to use technology to find a multiple regression equation, the standard 1 31, 32 
error of estimate, and the coefficient of determination 


= How to use a multiple regression equation to predict y-values 2 33, 34 


y= b+ myxy + myx, + myx3 + +++ + gx, 
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ED REVIEW EXERCISES 


M@ SECTION 9.1 


In Exercises 1-4, display the data in a scatter plot. Then calculate the sample 
correlation coefficient r. Determine whether there is a positive linear correlation, 
a negative linear correlation, or no linear correlation between the variables. What 
can you conclude? 


1. The number of pass attempts and passing yards for seven professional 


quarterbacks for a recent regular season (Source: National Football League) 


583 571 550 541 506 514 486 


"Passing yards, y_ 4770 4500 4483 4434 4328 4388 | 4254 


2. The number of wildland fires (in thousands) and the number of wildland acres 
burned (in millions) in the United States for eight years (Source: National 
Interagency Coordinate Center) 


84.1 73.5 63.6 65.5 | 66.8 96.4 85.7 | 79.0 
3.6 7.2 4.0 8.1 8.7 9.9 9.3 a3 


3. The IQ and brain size, as measured by the total pixel count (in thousands) 
from an MRI scan, for nine female college students (Adapted from Intelligence) 


140 | 96 8 101 135 85 77 88 
856 879 865 808 | 791 799 794 | 894 


4. The annual per capita sugar consumption (in kilograms) and the average 
number of cavities of 11- and 12-year-old children in seven countries 


"Sugar consumption, x 24 | so | 6@| 6s | 99 |-e7 | te 


0.59 | 1.51 | 1.55 | 1.70 | 2.18 | 2.10 | 2.73 


In Exercises 5 and 6, use the given sample statistics to test the claim about the 
population correlation coefficient p at the indicated level of significance a. 

5. Claim: p # 0;a = 0.01. Sample statistics: r = 0.24, n = 26 

6. Claim: p # 0; a = 0.05. Sample statistics: r = —0.55,n = 22 


In Exercises 7-10, test the claim about the population correlation coefficient p at 
the indicated level of significance a. Then interpret the decision in the context of 
the original claim. 


7. Refer to the data in Exercise 1. At a = 0.05, test the claim that there is a 
significant linear correlation between a quarterback’s pass attempts and 
passing yards. 


8. Refer to the data in Exercise 2. At a = 0.05, is there enough evidence to 
conclude that there is a significant linear correlation between the number of 
wildland fires and the number of acres burned? 
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9. Refer to the data in Exercise 3. At a = 0.01, test the claim that there is a 
significant linear correlation between a female college student’s IQ and brain 
size. 


10. Refer to the data in Exercise 4. At a = 0.01, is there enough evidence to 
conclude that there is a significant linear correlation between sugar 
consumption and tooth decay? 


M@ SECTION 9.2 


In Exercises 11-14, find the equation of the regression line for the given data. Then 
construct a scatter plot of the data and draw the regression line. Can you make 
a guess about the sign and magnitude of r? Calculate r and check your guess. 
If convenient, use technology to solve the problem. 


11. The amounts of milk (in billions of pounds) produced in the United States 
and the average prices per gallon of milk for nine years (Adapted from U.S. 
Department of Agriculture and U.S. Bureau of Labor Statistics) 


167.6 165.3.) 170.1 1704 170.9 
2.79 2.90 2.68 2.95 3.23 


177.0 181.8 | 185.7. 190.0 
3.24 3.00 3.87 3.68 


" 12. The average times (in hours) per day spent watching television for men 
and women for the last 10 years (Adapted from The Nielsen Company) 


403 418 432 437 448 443 452 4.58 4.65 4.82 
4.67 4.77 485 4.97 5.08 5.12 5.28 5.28 5.32 5.42 


13. The ages (in years) and the number of hours of sleep in one night for 
seven adults 


20 59 42 68 38 75 
9 5 6 | 5 8 4 


14. The engine displacements (in cubic inches) and the fuel efficiencies (in miles 
per gallon) of seven automobiles 


170 134. 220. 305s 109 | 256 = 322 
29.5 | 34.5 | 23.0 | 17.0 | 33.5 | 23.0 | 15.5 


In Exercises 15-18, use the regression equations found in Exercises 11-14 to 
predict the value of y for each value of x, if meaningful. If not, explain why not. 
(Each pair of variables has a significant correlation.) 


15. Refer to Exercise 11. What price per gallon would you predict for a milk 
production of (a) 160 billion pounds? (b) 175 billion pounds? (c) 180 billion 
pounds? (d) 200 billion pounds? 


16. Refer to Exercise 12. What average time per day spent watching television 
for women would you predict when the average time per day for men is 
(a) 4.2 hours? (b) 4.5 hours? (c) 4.75 hours? (d) 5 hours? 
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17. Refer to Exercise 13. How many hours of sleep would you predict for an 


adult of age (a) 18 years? (b) 25 years? (c) 85 years? (d) 50 years? 


18. Refer to Exercise 14. What fuel efficiency rating would you predict for a car 
with an engine displacement of (a) 86 cubic inches? (b) 198 cubic inches? 
(c) 289 cubic inches? (d) 407 cubic inches? 


M@ SECTION 9.3 


In Exercises 19-22, use the value of the linear correlation coefficient to calculate 
the coefficient of determination. What does this tell you about the explained 
variation of the data about the regression line? About the unexplained variation? 


19. r = —0.450 20. r = —0.937 
21. r = 0.642 22. r = 0.795 
In Exercises 23 and 24, use the data to find the (a) coefficient of determination 


r? and interpret the result, and (b) standard error of estimate s, and interpret 
the result. 


23. The table shows the prices (in thousands of dollars) and fuel efficiencies 
(in miles per gallon) for nine compact sports sedans. The regression equation 
is y = —0.414x + 37.147. (Adapted from Consumer Reports) 


29.7 | 33.7 | 37.5 | 32.7 | 39.2 | 37.3 | 31.6 


eel 21 19 25 24 22 24 23 21 23 


*. 24, The table shows the cooking areas (in square inches) of 18 gas grills and 
their prices (in dollars). The regression equation is y = 1.454x — 532.053. 
(Source: Lowe’s) 


780 530 942 660 600 732 660 640 869 
359 98 = «547. 299, 449.799 699-199 1049 


860 700 942 890 733 732 464 869 600 
499 248 597 999 428 849 99 999 399 


In Exercises 25-30, construct the indicated prediction interval and interpret the 
results. 


25. Construct a 90% prediction interval for the price per gallon of milk in 
Exercise 11 when 185 billion pounds of milk is produced. 


26. Construct a 90% prediction interval for the average time women spend per 
day watching television in Exercise 12 when the average time men spend per 
day watching television is 4.25 hours. 


27. Construct a 95% prediction interval for the number of hours of sleep for an 
adult in Exercise 13 who is 45 years old. 


28. Construct a 95% prediction interval for the fuel efficiency of an automobile 
in Exercise 14 that has an engine displacement of 265 cubic inches. 
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29. Construct a 99% prediction interval for the fuel efficiency of a compact 
sports sedan in Exercise 23 that costs $39,900. 


30. Construct a 99% prediction interval for the price of a gas grill in Exercise 24 
with a usable cooking area of 900 square inches. 


M@ SECTION 9.4 


In Exercises 31 and 32, use the data in the table, which shows the carbon monoxide, 
tar, and nicotine content, all in milligrams, of 14 brands of U.S. cigarettes. (Source: 
Federal Trade Commission) 


15 16 1.1 
17 16 1.0 
11 10 0.8 
12 11 0.9 
14 13 0.8 
16 14 0.8 
14 16 1.2 
16 16 1.2 
10 10 0.8 
18 19 1.4 
17 17 1.2 
11 12 1.0 
10 9 0.7 
14 15 1.2 


" 31. Use technology to find the multiple regression equation for the data. 


32. Find the standard error of estimate s, and the coefficient of determination 
r’, What percentage of the variation of y can be explained by the 
regression equation? 


In Exercises 33 and 34, use the multiple regression equation to predict the y-values 
for the given values of the independent variables. 


33. An equation that can be used to predict fuel economy (in miles per gallon) 
for automobiles is y = 41.3 — 0.004x, — 0.0049x, where x, is the engine 
displacement (in cubic inches) and x is the vehicle weight (in pounds). 

(a) Eo 305, x2 >= 3750 
(b) x; = 225, x2 = 3100 
(c) x; = 105, x. = 2200 
(d) x; = 185, x. = 3000 

34. Use the regression equation found in Exercise 31. 
(a) x; = 10, x9 = 0.7 
(b) x, = 15, x. = 1.1 
(c) 41> 13, od 0.8 
(d) x; = 9, x2 = 0.8 
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DE) cuapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


©. For Exercises 1-8, use the data in the table, which shows the average annual 

salaries (both in thousands of dollars) for public school principals and public 

school classroom teachers in the United States for 11 years. (Adapted from 
Educational Research Service) 


62.5 ye) 
71.9 41.4 
74.4 42.2 
778 43.7 
78.4 43.8 
80.8 45.0 
80.5 45.6 
81.5 45.9 
84.8 48.2 
87.7 49.3 
91.6 51.3 


1. Construct a scatter plot for the data. Do the data appear to have a positive 
linear correlation, a negative linear correlation, or no linear correlation? 
Explain. 

2. Calculate the correlation coefficient r. What can you conclude? 

3. Test the level of significance of the correlation coefficient r. Use a = 0.05. 


4. Find the equation of the regression line for the data. Draw the regression line 
on the scatter plot. 


5. Use the regression equation to predict the average annual salary of public 
school classroom teachers when the average annual salary of public school 
principals is $90,500. 


6. Find the coefficient of determination r* and interpret the result. 
7. Find the standard error of estimate s, and interpret the result. 


8. Construct a 95% prediction interval for the average annual salary of public 
school classroom teachers when the average annual salary of public school 
principals is $85,750. Interpret the results. 


9. Stock Price The equation used to predict the stock price (in dollars) at the 
end of the year for McDonald’s Corporation is 
y = —47 + 5.91x, — 1.99x, 


where x, is the total revenue (in billions of dollars) and x, is the shareholders’ 
equity (in billions of dollars). Use the multiple regression equation to predict 
the y-values for the given values of the independent variables. (Adapted from 
McDonald’s Corporation) 


(a) x; = 22.7, x) = 14.0 
(b) x, = 17.9, x) = 142 
(c) x; = 20.9, x. = 15.5 
(d) x, = 19.1, x. = 15.1 
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> PUTTING IT ALL TOGETHER 


Acid rain affects the environment by increasing the acidity of lakes 
and streams to dangerous levels, damaging trees and soil, accelerating 
the decay of building materials and paint, and destroying national 
monuments. The goal of the Environmental Protection Agency’s (EPA) 


Acid 


Rain Program is to achieve environmental health benefits by 


reducing the emissions of the primary causes of acid rain: sulfur dioxide 
and nitrogen oxides. 

You work for the EPA and you want to determine if there is a 
significant correlation between sulfur dioxide emissions and nitrogen 
oxides emissions. 


1. Analyzing the Data 


(a) 


(b) 


(e) 


(f) 


The data in the table show the sulfur dioxide emissions 
(in millions of tons) and the nitrogen oxides emissions (in 
millions of tons) for 14 years. Construct a scatter plot of the data 
and make a conclusion about the type of correlation between 
sulfur dioxide emissions and nitrogen oxides emissions. 


Calculate the correlation coefficient r and verify your 
conclusion in part (a). 


Test the significance of the correlation coefficient found in part 
(b). Use a = 0.05. 


Find the equation of the regression line for sulfur dioxide 
emissions and nitrogen oxides emissions. Add the graph of the 
regression line to your scatter plot in part (a). Does the 
regression line appear to be a good fit? 

Can you use the equation of the regression line to predict the 
nitrogen oxides emission given the sulfur dioxide emission? 
Why or why not? 

Find the coefficient of determination r? and the standard error 
of estimate s,. Interpret your results. 


2. Making Predictions 


The EPA set a goal of reducing sulfur dioxide emissions levels by 10 
million tons from 1980 levels of 17.3 million tons. Construct a 95% 
prediction interval for the nitrogen oxides emissions for this sulfur 
dioxide emissions goal level. Interpret the results. 


mg Real Statistics — Real Decisions 


Sulfur dioxide 
emissions, x 
11.8 
12.5 
12.9 
13.1 
12.5 
11.2 
10.6 
10.2 
10.6 
10.3 
10.2 
9.4 
8.9 
7.6 


Nitrogen oxides 
emissions, y 


5.8 
6.0 
6.0 
6.0 
5.5 
sl 
4.7 
4.5 
4.2 
3.8 
3.6 
3.4 
3.3 
3.0 


Source: Environmental Protection Agency 
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TECHNOLOGY MINITAB 


U.S. Food and Drug 
ED/ DAN Administration 
NUTRIENTS IN BREAKFAST CEREALS 


The U.S. Food and Drug Administration (FDA) requires 
nutrition labeling for most foods. Under FDA regulations, 
manufacturers are required to list the amounts of certain 
nutrients in their foods, such as calories, sugar, fat, and 
carbohydrates. This nutritional information is displayed in 
the “Nutrition Facts” panel on the food’s package. 

The table shows the following nutritional content for 
one cup of each of 21 different breakfast cereals. 


C = calories 

S = sugar in grams 

F = fat in grams 

R = carbohydrates in grams 


M@ EXERCISES 


1. Use a technology tool to draw a scatter plot of 4. 
the following (x, y) pairs in the data set. 
(a) (calories, sugar) 
(b) (calories, fat) 
(c) (calories, carbohydrates) 5 
(d) (sugar, fat) 
(e) (sugar, carbohydrates) 
(f) (fat, carbohydrates) 
2. From the scatter plots in Exercise 1, which pairs 


of variables appear to have a strong linear 
correlation? 6. 


3. Use a technology tool to find the correlation 
coefficient for each pair of variables in Exercise 1. 
Which has the strongest linear correlation? 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 


TECHNOLOGY 
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TI-83/84 PLUS 


Apple Jacks® 100 12) O05 = 25 
Berry Burst Cheerios® 130 11 15 29 
Cheerios® 100 1|2 20 
Cocoa Puffs® 130 | 15 | 2 31 
Cookie Crisp® 130 | 13 | 1.5 | 29 
Corn Chex® 120 3 | 05 | 26 
Corn Flakes® 100 2/0 24 
Corn Pops® 120. 10) O 29 
Count Chocula® 150 16 15 | 31 
Crispix® 110 4 0 25 
Froot Loops® 110 12> «1 25 
Frosted Flakes® 150 15 0 36 
Golden Grahams® 160 | 15 |) 1.5 | 35 
Honey Nut Cheerios® 150 12 2 29 
Lucky Charms® 150 15 15 = 29 
Multi Grain Cheerios® 110 6) 1 23 
Raisin Bran® 190 19 15 | 45 
Rice Krispies® 100 3.0 23 
Special K® 120 4 05 | 23 
Trix® 120 11 #15 += 28 
Wheaties® 130 5 | 0.5 | 29 


Use a technology tool to find an equation of a 
regression line for the following variables. 


(a) (calories, sugar) 


(b) (calories, carbohydrates) 


. Use the results of Exercise 4 to predict the 


following. 


(a) The sugar content of one cup of cereal that 
has a caloric content of 120 calories 


(b) The carbohydrate content of one cup of cereal 
that has a caloric content of 120 calories 


Use a technology tool to find the multiple 
regression equations of the following forms. 


(a) C=b+m,S + mF + m3R 
(b) C=b+ mS + mR 


. Use the equations from Exercise 6 to predict the 


caloric content of 1 cup of cereal that has 
7 grams of sugar, 0.5 gram of fat, and 31 grams of 
carbohydrates. 
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CHI-SQUARE 
TESTS AND THE 
F-DISTRIBUTION 


10.1 Goodness-of-Fit Test 

10.2 Independence 
@ CASE STUDY 

10.3 Comparing Two 
Variances 

10.4 Analysis of Variance 
m@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


@ TECHNOLOGY 


Crash tests performed by the Insurance 
Institute for Highway Safety demonstrate how 
a vehicle will react when in a realistic collision. 
Tests are performed on the front, side, and 
rear of the vehicles. Results of these tests are 
classified using the ratings good, acceptable, 
marginal, and poor. 
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«€ WHERE YOU'VE BEEN 


The Insurance Institute for Highway Safety buys 
new vehicles each year and crashes them into a 
barrier at 40 miles per hour to compare how 
different vehicles protect drivers in a frontal 
offset crash. In this test,40% of the total width of 
the vehicle strikes the barrier on the driver side. 
The forces and impacts that occur during a crash 
test are measured by equipping dummies with 
special instruments and placing them in the car. 
The crash test results include data on head, chest, 
and leg injuries. For a low crash test number, the 
injury potential is low. If the crash test number is 
high, then the injury potential is high. Using the 
techniques of Chapter 8, you can determine if 
the mean chest injury potential is the same for 
pickups and minivans. (Assume the population 


WHERE YOU’RE GOING p> 


In Chapter 8, you learned how to test a 
hypothesis that compares two populations by 
basing your decisions on sample statistics and 
their distributions. In this chapter, you will learn 
how to test a hypothesis that compares three or 
more populations. 


For instance, in addition to the crash tests for 
minivans and pickups, a third group of vehicles 
was also tested. The results for all three types of 
vehicles are as follows. 


Minivans n=9 xX, = 29.9 
Pickups ng = 19 Xy = 30.4 
Midsize SUVs | n3 = 32) -X¥3 = 34.1 


variances are equal.) The sample statistics are as 
follows. (Adapted from Insurance Institute for 
Highway Safety) 


Minivans ny, = 9 xX, = 29.9 
m= 19 X= 30.4 


8, = 3.33 
Pickups So = 4.21 


For the means of chest injury, the P-value for the 
hypothesis that mf, = m2 is about 0.7575. At 
a = 0.05, you fail to reject the null hypothesis. 
So, you do not have enough evidence to 
conclude that there is a significant difference in 
the means of the chest injury potential in a 
frontal offset crash at 40 miles per hour for 


minivans and pickups. 


From these three samples, is there evidence of 
a difference in chest injury potential among 
minivans, pickups, and midsize SUVs in a frontal 
offset crash at 40 miles per hour? 


In this chapter, you will learn that you can 
answer this question by testing the hypothesis 
that the three means are equal. For the means of 
chest injury, the P-value for the hypothesis that 
Ly = My = pz Is about 0.0088. At a = 0.05, you 
can reject the null hypothesis. So, you can 
conclude that for the three types of vehicles 
tested, at least one of the means of the chest 
injury potential in a frontal offset crash at 
40 miles per hour is different from the others. 
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WHAT YOU SHOULD LEARN 


» How to use the chi-square 
distribution to test whether a 
frequency distribution fits 
a claimed distribution 


INSIGHT 


The hypothesis tests 
described in Sections 
10.1 and 10.2 can be 
used for qualitative 

data. 


Goodness-of-Fit Test 


The Chi-Square Goodness-of-Fit Test 
>» THE CHI-SQUARE GOODNESS-OF-FIT TEST 


Suppose a tax preparation company wants to determine the proportions of 
people who used different methods to prepare their taxes. To determine these 
proportions, the company can perform a multinomial experiment. A multinomial 
experiment is a probability experiment consisting of a fixed number of 
independent trials in which there are more than two possible outcomes for each 
trial. The probability of each outcome is fixed, and each outcome is classified into 
categories. (Remember from Section 4.2 that a binomial experiment has only two 
possible outcomes.) 

Now, suppose the company wants to test a previous survey’s claim concerning 
the distribution of proportions of people who used different methods to prepare 
their taxes. To do so, the company could compare the distribution of proportions 
obtained in the multinomial experiment with the previous survey’s specified 
distribution. How can the company compare the distributions? The answer is, 
perform a chi-square goodness-of-fit test. 


DEFINITION 


A chi-square goodness-of-fit test is used to test whether a frequency 
distribution fits an expected distribution. 


To begin a goodness-of-fit test, you must first state a null and an alternative 
hypothesis. Generally, the null hypothesis states that the frequency distribution 
fits the specified distribution and the alternative hypothesis states that the 
frequency distribution does not fit the specified distribution. 

For instance, suppose the previous survey claims that the distribution of 
people who used different methods to prepare their taxes is as shown below. 


Accountant 25% 
By hand 20% 
Computer software 35% 
Friend/family 5% 
Tax preparation service 15% 


To test the previous survey’s claim, the company can perform a chi-square 
goodness-of-fit test using the following null and alternative hypotheses. 


Hy: The distribution of tax preparation methods is 25% by accountant, 
20% by hand, 35% by computer software, 5% by friend or family, and 
15% by tax preparation service. (Claim) 


H,: The distribution of tax preparation methods differs from the claimed or 
expected distribution. 
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To calculate the test statistic for the chi-square goodness-of-fit test, you can 
use observed frequencies and expected frequencies. To calculate the expected 
frequencies, you must assume the null hypothesis is true. 

The pie chart shows the 
distribution of health care visits 
to doctor offices, emergency DEFINITION 

departments, and home visits The observed frequency O of a category is the frequency for the category 


in a recent year. (Source: National observed in the sample data. 
Center for Health Statistics) 


The expected frequency E of a category is the calculated frequency for the 
10 or more visits category. Expected frequencies are obtained assuming the specified (or 
12.8% hypothesized) distribution. The expected frequency for the ith category is 


[Bn = iio 


where n is the number of trials (the sample size) and p; is the assumed 
estvicite probability of the ith category. 


47.2% 
EXAMPLE 1 


A researcher randomly selects 


200 people and asks them how » Finding Observed Frequencies and Expected Frequencies 
many visits they make to the A tax preparation company randomly Survey results (m= 300) 
doctor in a year: 1-3, 4-9, 10 selects 300 adults and asks them how they 
or more, or none. What is the prepare their taxes. The results are shown Accountant a 
expected frequency for each at the right. Find the observed frequency By hand 40 
response? and the he adtets pie for each tax Computer software 101 
preparaben met od. (Adapted from National Friend/anily 35 
Retail Federation) 
Tax preparation service 53 
> Solution 


The observed frequency for each tax preparation method is the number of 
adults in the survey naming a particular tax preparation method. The expected 
frequency for each tax preparation method is the product of the number of 
adults in the survey and the probability that an adult will name a particular tax 
preparation method. The observed frequencies and expected frequencies are 
shown in the following table. 


Accountant 25% 300(0.25) = 75 
INSIGHT By hand 20% 40 300(0.20) = 60 
The sum of the expected Computer software 35% 101 300(0.35) = 105 
frequencies always equals Friend/family 5% 35 300(0.05) = 15 
the sum of the observed . : a 
frequencies. For Tax preparation service 15% 53 300(0.15) = 45 


instance, in Example 1 
the sum of the > Try It Yourself 1 
observed frequencies 
and the sum of the 
expected frequencies 
are both 300. Multiply 500 by the probability that an adult will name each particular tax 
preparation method to find the expected frequencies. Answer: Page A45 


Suppose the tax preparation company randomly selects 500 adults. Find the 
expected frequency for each tax preparation method. 
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STUDY TIP 


Remember that a chi-square 
distribution is positively skewed 
and its shape is determined by 
the degrees of freedom. The 
graph is not symmetric, 
but it appears to become 
more symmetric as the 
degrees of freedom 
increase, as shown in 
Section 6.4. 


For the chi-square goodness-of-fit test to be used, the following must be true. 


1. The observed frequencies must be obtained using a random sample. 


2. Each expected frequency must be greater than or equal to 5. 


If the expected frequency of a category is less than 5, it may be possible to 
combine it with another category to meet the requirements. 


THE CHI-SQUARE GOODNESS-OF-FIT TEST 


If the conditions listed above are satisfied, then the sampling distribution for 
the goodness-of-fit test is approximated by a chi-square distribution with 
k — 1 degrees of freedom, where k is the number of categories. The test 
statistic for the chi-square goodness-of-fit test is 


(Cay 


a 
eS 


where O represents the observed frequency of each category and E represents 
the expected frequency of each category. 


When the observed frequencies closely match the expected frequencies, the 
differences between O and E will be small and the chi-square test statistic will 
be close to 0. As such, the null hypothesis is unlikely to be rejected. However, 
when there are large discrepancies between the observed frequencies and the 
expected frequencies, the differences between O and E will be large, resulting in 
a large chi-square test statistic. A large chi-square test statistic is evidence for 
rejecting the null hypothesis. So, the chi-square goodness-of-fit test is always a 
right-tailed test. 


GUIDELINES 


Performing a Chi-Square Goodness-of-Fit Test 
Verify that the expected frequency is at least 5 for each category. 


IN WORDS 


. Identify the claim. State the null and 


alternative hypotheses. 


. Specify the level of significance. 
. Determine the degrees of freedom. 
. Determine the critical value. 


. Determine the rejection region. 


. Find the test statistic and sketch the 


sampling distribution. 


. Make a decision to reject or fail to 


reject the null hypothesis. 


. Interpret the decision in the context 


of the original claim. 
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IN SYMBOLS 
State Ho and H,. 


Identify a. 
hi, = ke = Il 
Use Table 6 in 
Appendix B. 


(Qaae 


2 — 
v= h | 
If y’ is in the rejection 
region, reject Hy. Other- 
wise, fail to reject Hp. 


Accountant 71 
By hand 40 
Computer 101 
software 

Friend/family 35 
Tax 53 
preparation 


service 


ie = 13.277 
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SC] Report 45 


» Performing a Chi-Square Goodness-of-Fit Test 


The tax preparation methods of adults from a previous survey are distributed 
as shown in the table at the left below. A tax preparation company randomly 
selects 300 adults and asks them how they prepare their taxes. The results 
are shown in the table at the right below. At a = 0.01, perform a chi-square 
goodness-of-fit test to test whether the distributions are different. (Adapted 
from National Retail Federation) 


EXAMPLE 2 


Accountant 25% Accountant 

By hand 20% By hand 40 

Computer software 35% Computer software 101 

Friend/family 5% Friend/family 35 

Tax preparation service 15% Tax preparation service 53 
> Solution 


The observed and expected frequencies are shown in the table at the left. 
The expected frequencies were calculated in Example 1. Because the observed 
frequencies were obtained using a random sample and each expected 
frequency is at least 5, you can use the chi-square goodness-of-fit test to test 
the proposed distribution. The null and alternative hypotheses are as follows. 


Hy: The distribution of tax preparation methods is 25% by accountant, 
20% by hand, 35% by computer software, 5% by friend or family, and 
15% by tax preparation service. 


H,: The distribution of tax preparation methods differs from the claimed 
or expected distribution. (Claim) 


Because there are 5 categories, the chi-square distribution has k — 1= 
5 — 1 = 4 degrees of freedom. With d.f. = 4 and a = 0.01, the critical value 
is x7, = 13.277. With the observed and expected frequencies, the chi-square 
test statistic is 


ves (O mak 
(71 — 75)? (40-60)? (101 — 105)? 
a kn | ae. 
35 — 15)? (53 — 45)? 
+! 15 ss 45 
~ 35.121. 


The graph at the left shows the location of the rejection region. Because y? is 
in the rejection region, you should reject the null hypothesis. 


Interpretation There is enough evidence at the 1% level of significance to 
conclude that the distribution of tax preparation methods differs from the 
previous survey’s claimed or expected distribution. 
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0-9 
10-19 
20-29 
30-39 
40-49 
50-59 
60-69 

70+ 


Brown 
Yellow 
Red 
Blue 
Orange 


Green 


CHAPTER 10 


16% 
20% 

8% 
14% 
15% 
12% 
10% 

5% 


80 
95 
88 
83 
76 
78 
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76 
84 
30 
60 
54 
40 
42 
14 


> Try It Yourself 2 


A sociologist claims that the age distribution for the residents of a certain city 
is different than it was 10 years ago. The distribution of ages 10 years ago is 
shown in the table at the left. You randomly select 400 residents and record the 
age of each. The survey results are shown in the table. At a = 0.05, perform a 
chi-square goodness-of-fit test to test whether the distribution has changed. 


. Verify that the expected frequency is at least 5 for each category. 

. Identify the claimed distribution and state Hp and H,. 

. Specify the level of significance a. 

Determine the degrees of freedom. 

Determine the critical value and the rejection region. 

Find the chi-square test statistic. Sketch a graph. 

. Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A45 


So mean op 


The chi-square goodness-of-fit test is often used to determine whether a 
distribution is uniform. For such tests, the expected frequencies of the categories 
are equal. When testing a uniform distribution, you can find the expected 
frequency of each category by dividing the sample size by the number of 
categories. For instance, suppose a company believes that the number of sales 
made by its sales force is uniform throughout the five-day work week. If the 
sample consists of 1000 sales, then the expected value of the sales for each day 
will be 1000/5 = 200. 


EXAMPLE 3 G@® Report 46 


> Performing a Chi-Square Goodness-of-Fit Test 


A researcher claims that the number of different-colored candies in bags 
of dark chocolate M&M’s is uniformly distributed. To test this claim, you 
randomly select a bag that contains 500 dark chocolate M&M/’s. The results 
are shown in the table at the left. At a = 0.10, perform a chi-square 
goodness-of-fit test to test the claimed or expected distribution. (Adapted from 
Mars, Incorporated) 


> Solution 


The claim is that the distribution is uniform, so the expected frequencies of the 
colors are equal. To find each expected frequency, divide the sample size by the 
number of colors. So, for each color, E = 500/6 ~ 83.33. Because each 
expected frequency is at least 5 and the M&M’s were randomly selected, you 
can use the chi-square goodness-of-fit test to test the claimed distribution. The 
null and alternative hypotheses are as follows. 


Hy: The distribution of the different-colored candies in bags of dark 
chocolate M&M’s is uniform. (Claim) 


H,: The distribution of the different-colored candies in bags of dark 
chocolate M&M’s is not uniform. 


Because there are 6 categories, the chi-square distribution has k — 1= 
6 — 1 = 5 degrees of freedom. Using d.f. = 5 and a = 0.10, the critical value 
is x5 = 9.236. With the observed and expected frequencies, the chi-square test 
statistic is shown in the following table. 
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STUDY TIP 
Another way to calculate 
the chi-square test 80 83.33 —3.33 11.0889 0.1330721229 
statistic is to organize i 
Hee (culacon cin 95 83.33 11.67 136.1889 1.6343321733 
a table. 88 83.33 4.67 21.8089 0.2617172687 
83 83.33 —0.33 0.1089 0.0013068523 
76 | 83.33 —7.33 53.7289 0.6447725909 
78 | 83.33 —5.33 28.4089 0.3409204368 
O = EY 
v= s! ) ~ 3.016 
The graph shows the location of the rejection region and the chi-square test 
statistic. Because y” is not in the rejection region, you should fail to reject the 
null hypothesis. 
| Rejection 
| Tegion 
a=0.10 
xv 
5 15 20-25 
H?23.016 x2 =9.236 
“Color Frequency, f Interpretation There is not enough evidence at the 10% level of significance 
to reject the claim that the distribution of the different-colored candies in bags 
Brown 22 of dark chocolate M&M’s is uniform. 
Yellow 27 
Red » > Try It Yourself 3 


Bie Al A researcher claims that the number of different-colored candies in bags of 

peanut M&M’s is uniformly distributed. To test this claim, you randomly select 

a bag that contains 180 peanut M&M’s. The results are shown in the table at 

Green 27 the left. Using a = 0.05, perform a chi-square goodness-of-fit test to test the 
claimed or expected distribution. (Adapted from Mars, Incorporated) 


Orange 41 


. Verify that the expected frequency is at least 5 for each category. 

. Identify the claimed distribution and state Hp and H,. 

Specify the level of significance a. 

. Determine the degrees of freedom. 

Determine the critical value and the rejection region. 

Find the chi-square test statistic. Sketch a graph. 

. Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A46 


po moan op 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What is a multinomial experiment? 


2. What conditions are necessary to use the chi-square goodness-of-fit test? 


Finding Expected Frequencies Jn Exercises 3-6, find the expected frequency 
for the given values of n and p;. 


3. n = 150, p; = 0.3 4. n = 500, p; = 0.9 
5. n = 230, p; = 0.25 6. n = 415, p; = 0.08 


M@ USING AND INTERPRETING CONCEPTS 


Performing a Chi-Square Goodness-of-Fit Test Jn Exercises 7-16, 
(a) identify the claimed distribution and state Hy and H,, (b) find the critical value 
and identify the rejection region, (c) find the chi-square test statistic, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. 


7. Ages of Moviegoers Results from a previous survey asking people who go 
to movies at least once a month for their ages are shown in the graph. To 
determine whether this distribution is still the same, you randomly select 1000 
people who go to movies at least once a month and record the age of each. The 
results are shown in the table. At a = 0.10, are the distributions the same? 
(Source: Motion Picture Association of America) 


19.8% 
26.7% 
19.7% 


19.8% 14% 


Wie. 


8. Coffee Results from a previous survey asking coffee drinkers how much 
coffee they drink are shown in the graph. To determine whether this 
distribution is still the same, you randomly select 1600 coffee drinkers and ask 
them how much coffee they drink. The results are shown in the table. 
At a = 0.05, are the distributions the same? (Source: Braun Research) 


do you drink? 
How much they drink: 


1 cup a day erp k aT 
1 cup a week 27% ps a wee 
13% 2 or more 1 cup a week 193 
1 cup a day 462 
2 or more cups a day 739 
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9. Ordering Delivery Results from a previous survey asking people which day 
of the week they are most likely to order food for delivery are shown in the 
graph. To determine whether this distribution has changed, you randomly 
select 500 people and record which day of the week each is most likely to 
order food for delivery. The results are shown in the table. At a = 0.01, can 
you conclude that there has been a change in the claimed or expected 
distribution? (Source: Technomic, Inc.) 


Food at your door ee 
Day of the week Americans 
are most likely to order 
; Sunday 43 
Monday 16 
Tuesday 25 
Wednesday 49 
Thursday 46 
Friday 168 
Saturday 153 


10. Reasons Workers Leave A personnel director believes that the distribution 
of the reasons workers leave their jobs is different from the one shown in the 
graph. The director randomly selects 200 workers who recently left their jobs 
and records each worker’s reason for doing so. The results are shown in the 
table. At a = 0.01, are the distributions different? (Source: Robert Half 
International, Inc.) 


potential — a 
‘25% Limited advancement 78 
Lack of potential 
recognition = ;—— 
15% I ones Lack of recognition 52 
ow salary/ . 2 
benefits Low salary/benefits 30 
eae Bored Unhappy with memt. 25 
i 5 
Bored/don’t know 15 


11. Homicides by Season A researcher believes that the number of homicide 
crimes in California by season is uniformly distributed. To test this claim, you 
randomly select 1200 homicides from a recent year and record the season 
when each happened. The results are shown in the table. At a = 0.05, can 
you reject the claim that the distribution is uniform? (Adapted from California 


Department of Justice) 


Spring 312 
Summer 299 
Fall 297 
Winter 292 
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12. Homicides by Month A researcher believes that the number of homicide 
crimes in California by month is uniformly distributed. To test this claim, you 
randomly select 1200 homicides from a recent year and record the month 
when each happened. The results are shown in the table. At a = 0.10, can 
you reject the claim that the distribution is uniform? (Adapted from California 
Department of Justice) 


January 98 July 84 
February 103 August 109 
March 114 September 112 
April 92 October 95 
May 106 November 91 
June 106 December 90 


13. College Education The pie chart shows the distribution of the opinions 
of U.S. parents on whether a college education is worth the expense. An 
economist believes that the distribution of the opinions of U.S. teenagers is 
different from the distribution for U.S. parents. The economist randomly 
selects 200 U.S. teenagers and asks each whether a college education is 
worth the expense. The results are shown in the table. At a = 0.05, are the 
distributions different? (Adapted from Upromise, Inc.) 


Somewhat disagree 
6% 


ror die csgee  |___ Remome _ Brequeney f 
nor disagree disagree 


5% 4% Strongly agree 86 


Somewhat agree 62 

Neither agree nor disagree 34 

Somewhat disagree 14 

Somewhat\ , y, Strongly Strongly disagree 4 


agree 


agree 
30% 


55% 


14. Saving for the Future The pie chart shows the distribution of the opinions 
of U.S. male adults on which is more important to save for, your child’s 
college education or your own retirement. A financial services company 
believes that the distribution of the opinions of U.S. female adults is the same 
as the distribution for U.S. male adults. The company randomly selects 
400 U.S. female adults and asks each which is more important—saving for 
your child’s college education or saving for your own retirement. The results 
are shown in the table. At a = 0.10, are the distributions the same? (Adapted 
from Country Financial) 


Saving for your 
™, child’s college 


Saving for your child’s 


college education 180 
Saving for 
your own Saving for your own ins 
retirement retirement 
37% 


Not sure 48 
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15. Home Sizes An organization believes that the number of prospective home 


16. 


buyers who want their next house to be larger, smaller, or the same size as 
their current house is uniformly distributed. To test this claim, you randomly 
select 800 prospective home buyers and ask them what size they want their 
next house to be. The results are shown in the table. At a = 0.05, can you 
reject the claim that the distribution is uniform? (Adapted from Better Homes 


and Gardens) 


Larger 285 
Same size 224 
Smaller 291 


Births by Day of the Week A doctor believes that the number of births by 
day of the week is uniformly distributed. To test this claim, you randomly 
select 700 births from a recent year and record the day of the week on which 
each takes place. The results are shown below. At a = 0.01, can you reject the 
claim that the distribution is uniform? (Adapted from National Center for Health 


Statistics) 


Sunday 65 
Monday 103 
Tuesday 114 
Wednesday 116 
Thursday 115 
Friday 112 
Saturday 75 


In Exercises 17 and 18, use StatCrunch to perform a chi-square goodness-of-fit 
test. Decide whether to reject the null hypothesis. Then, interpret the decision in the 
context of the original claim. 


17. 


Favorite Sport Results from a survey five years ago asking U.S. adults 
their favorite sport are shown in the pie chart. To determine whether this 
distribution has changed, a research organization randomly selects 400 U.S. 
adults and records each adult’s favorite sport. The results are shown in the 
table. At a = 0.10, can you conclude that there has been a change in the 
claimed or expected distribution? (Adapted from Harris Interactive) 
Auto racing 
Soccer 1% 
3% 
Baseball : 

Pro 15% Auto racing 36 

football Baseball 64 

30% College basketball 12 

bagketall College football 48 

6% 
Pro College Golf 16 
basketball football Hockey 16 
71% Colt 119% 
o Other/ Hockey 49 © Other/not sure 40 
not sure 4% Pro basketball 20 
13% 
Pro football 140 
Soccer 8 
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18. Paying Bills The pie chart shows the distribution of the opinions of U.S. 
adults who are married on how long they could go between jobs without any 
income and still be able to pay all of their bills on time. A researcher believes 
that the distribution of the opinions of U.S. adults who are not married is 
different from the distribution for U.S. adults who are married. The 
researcher randomly selects 250 U.S. adults who are not married and asks 
them how long they could go between jobs without any income and still be 
able to pay all of their bills on time. The results are shown in the table. At 
a = 0.01, are the distributions different? (Adapted from Country Financial) 


five months 
25% None 83 
One month 46 
Five months Two months 37 
3% Three months 11 
Four months 
4% Four months 9 
Three months Five months Wi 
71% Two months 
11% More than five 

52 

months 
Not sure 5 


M@ EXTENDING CONCEPTS 


Testing for Normality Using a chi-square goodness-of-fit test, you can decide, 
with some degree of certainty, whether a variable is normally distributed. In all 
chi-square tests for normality, the null and alternative hypotheses are as follows. 


A: The variable has a normal distribution. 


H,: The variable does not have a normal distribution. 


To determine the expected frequencies when performing a chi-square test for 
normality, first find the mean and standard deviation of the frequency distribution. 
Then, use the mean and standard deviation to compute the z-score for each class 
boundary. Then, use the z-scores to calculate the area under the standard normal 
curve for each class. Multiplying the resulting class areas by the sample size yields 
the expected frequency for each class. 


In Exercises 19 and 20, (a) find the expected frequencies, (b) find the critical value 
and identify the rejection region, (c) find the chi-square test statistic, (d) decide 
whether to reject or fail to reject the null hypothesis, and (e) interpret the decision 
in the context of the original claim. 


19. Test Scores The frequency distribution shows the results of 200 test scores. 
Are the test scores normally distributed? Use a = 0.01. 


49.5-58.5 | 58.5-67.5 | 67.5-76.5  76.5-85.5  85.5-94.5 
19 61 82 34 4 


20. Test Scores Ata = 0.05, test the claim that the 400 test scores shown in the 
frequency distribution are normally distributed. 


50.5-60.5  60.5-70.5 | 70.5-80.5 | 80.5-90.5  90.5—100.5 
28 106 151 97 18 
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Independence 


WHAT YOU SHOULD LEARN Contingency Tables > The Chi-Square Test for Independence 


» How to use a contingency >» CONTINGENCY TABLES 


table to find expected In Section 3.2, you learned that two events are independent if the occurrence of 
frequencies one event does not affect the probability of the occurrence of the other event. For 
instance, the outcomes of a roll of a die and a toss of a coin are independent. But, 
suppose a medical researcher wants to determine if there is a relationship 
between caffeine consumption and heart attack risk. Are these variables 
independent or are they dependent? In this section, you will learn how to use the 
chi-square test for independence to answer such a question. To perform a 
chi-square test for independence, you will use sample data that are organized in 
a contingency table. 


DEFINITION 


An r X c contingency table shows the observed frequencies for two variables. 
The observed frequencies are arranged in r rows and c columns. The 
intersection of a row and a column is called a cell. 


» How to use a chi-square 
distribution to test whether 
two variables are independent 


For instance, the following table is a2 < 5 contingency table. It has two rows and 
five columns and shows the results of a random sample of 2200 adults classified 
by their favorite way to eat ice cream and gender. From the table, you can see 
that 204 of the adults who prefer ice cream in a sundae are males, and 180 of the 
adults who prefer ice cream in a sundae are females. 


600 288 204 24 84 


Male 
Female 410 340 180 20 50 
(Adapted from Harris Interactive) 


Assuming the two variables of study in a contingency table are independent, you 
can use the contingency table to find the expected frequency for each cell. The 
formula for calculating the expected frequency for each cell is given below. 


FINDING THE EXPECTED FREQUENCY FOR 
CONTINGENCY TABLE CELLS 


STUDY TIP 


In a contingency table, the 
notation E, . represents 
the expected frequency 
for the cell in row r, 
column c. For instance, 

in the table above, E; 4 
represents the expected 
frequency for the cell in 
row 1, column 4. 


The expected frequency for a cell E, . in a contingency table is 


(Sum of rowr) X (Sum of column c) 


Expected frequency E, . = Sample size 


When you find the sum of each row and column in a contingency table, you 
are calculating the marginal frequencies. A marginal frequency is the frequency 
that an entire category of one of the variables occurs. For instance, in the table 
above, the marginal frequency for adults who prefer ice cream in a cone is 
288 + 340 = 628. The observed frequencies in the interior of a contingency 
table are called joint frequencies. The marginal frequencies for the contingency 
table in Example 1 have already been calculated. 
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INSIGHT 


In Example 1, once the expected 
frequency for £;,;, has been 
calculated to be 550.91, you 

can determine the expected 
frequency for F, ; to be 
1010 — 550.91 = 459.09. 
That is, the expected 
frequency for the last 
cell in each row or 
column can be found 

by subtracting from 

the total. 
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EXAMPLE 1 


> Finding Expected Frequencies 


Find the expected frequency for each cell in the contingency table. Assume 
that the variables, favorite way to eat ice cream and gender, are independent. 


Male 600 288 204 24 84 1200 
Female 410 340 180 20 50 1000 
Total 1010628 3840 44 1384 200 


> Solution 
After calculating the marginal frequencies, you can use the formula 


(Sum of rowr) X (Sum of column c) 


Expected frequency FE, . = 


Sample size 
to find each expected frequency as shown. 
E,4= ae =~ 550.91 Ey 2= ee =~ 342.55 
E.- =~ 209.45 E.-s = 24 
FE, .= a= = 285.45 Ey, 3= aS = 174.55 
Ey,.- os 20 E,,,-as = 60.91 


> Try It Yourself 1 


The marketing consultant for a travel agency wants to determine whether 
certain travel concerns are related to travel purpose. The contingency table 
shows the results of a random sample of 300 travelers classified by their 
primary travel concern and travel purpose. Assume that the variables travel 
concern and travel purpose are independent. Find the expected frequency for 
each cell. (Adapted from NPD Group for Embassy Suites) 


Business 36 108 14 22 
Leisure 38 54 14 14 


a. Calculate the marginal frequencies. 
b. Determine the sample size. 
c. Use the formula to find the expected frequency for each cell. 
Answer: Page A46 
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A researcher wants to determine 
whether a relationship exists 
between where people work 
(workplace or home) and their 
educational attainment. The 
results of a random sample of 
925 employed persons are shown 
in the contingency table. (Adapted 
from U.S. Bureau of Labor Statistics) 


Less than 
high school 


High school 
diploma 


Some 
college 


BA degree 
or higher 


Can the researcher use this 
sample to test for independence 
using a chi-square independence 
test? Why or why not? 
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> THE CHI-SQUARE TEST FOR INDEPENDENCE 


After finding the expected frequencies, you can test whether the variables are 
independent using a chi-square independence test. 


DEFINITION 


A chi-square independence test is used to test the independence of two 
variables. Using a chi-square test, you can determine whether the occurrence 
of one variable affects the probability of the occurrence of the other variable. 


For the chi-square independence test to be used, the following conditions 
must be true. 


1. The observed frequencies must be obtained using a random sample. 
2. Each expected frequency must be greater than or equal to 5. 


THE CHI-SQUARE INDEPENDENCE TEST 


If the conditions listed above are satisfied, then the sampling distribution for 
the chi-square independence test is approximated by a chi-square distribution 
with 


(7 = I\(e= i) 


degrees of freedom, where r and c are the number of rows and columns, 
respectively, of a contingency table. The test statistic for the chi-square 
independence test is 


(OSE): 


—_ 
ae > er 


where O represents the observed frequencies and E represents the expected 
frequencies. 


To begin the independence test, you must first state a null hypothesis and an 
alternative hypothesis. For a chi-square independence test, the null and alternative 
hypotheses are always some variation of the following statements. 


Hp: The variables are independent. 


H,: The variables are dependent. 


The expected frequencies are calculated on the assumption that the two 
variables are independent. If the variables are independent, then you can expect 
little difference between the observed frequencies and the expected frequencies. 
When the observed frequencies closely match the expected frequencies, the 
differences between O and E will be small and the chi-square test statistic will be 
close to 0. As such, the null hypothesis is unlikely to be rejected. 

However, if the variables are dependent, there will be large discrepancies 
between the observed frequencies and the expected frequencies. When the 
differences between O and E are large, the chi-square test statistic is also large. 
A large chi-square test statistic is evidence for rejecting the null hypothesis. So, 
the chi-square independence test is always a right-tailed test. 
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STUDY TIP 


A contingency table with 

three rows and four 

columns will have 

(Ge 1) (4 1) (23) 
= 6d.f. 
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GUIDELINES 


Performing a Chi-Square Test for Independence 


IN WORDS IN SYMBOLS 
1. Identify the claim. State the null and State Ho and H,. 
alternative hypotheses. 
2. Specify the level of significance. Identify a. 
3. Determine the degrees of freedom. ahi, = (7 = ie = 1} 
4. Determine the critical value. Use Table 6 in 
Appendix B. 


5. Determine the rejection region. 


re ee Oa ae 
6. Find the test statistic and sketch the x => 
sampling distribution. 
7. Make a decision to reject or fail to If y’ is in the rejection 
reject the null hypothesis. region, reject Hy. Other- 


wise, fail to reject Hp. 


8. Interpret the decision in the context 
of the original claim. 


EXAMPLE 2 G® Report 47 


>» Performing a Chi-Square Independence Test 


The contingency table shows the results of a random sample of 2200 adults 
classified by their favorite way to eat ice cream and gender. The expected 
frequencies are displayed in parentheses. At a = 0.01, can you conclude that 
the adults’ favorite ways to eat ice cream are related to gender? 


ee 600 288 204 24 84 1200 
(550.91) | (342.55) (209.45) (24) (73.09) 
— 410 340 180 20 50 1000 
(459.09) (285.45) (174.55) (20) (60.91) 
Total 1010 628 384 44 134 2200 
> Solution 


The expected frequencies were calculated in Example 1. Because each expected 
frequency is at least 5 and the adults were randomly selected, you can use the 
chi-square independence test to test whether the variables are independent. 
The null and alternative hypotheses are as follows. 


Hy: The adults’ favorite ways to eat ice cream are independent of gender. 


H,: The adults’ favorite ways to eat ice cream are dependent on gender. 
(Claim) 
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Because the contingency table has two rows and five columns, the chi-square 
distribution has (r — 1)(c — 1) = (2 — 1)(5 — 1) = 4 degrees of freedom. 
Because d.f. = 4 and a = 0.01, the critical value is yj = 13.277. With the 
observed and expected frequencies, the chi-square test statistic is as shown. 


600 550.91 49.09  2409.8281 4.3743 
288 342.55 | —54.55 | 2975.7025 8.6869 
204 209.45 —5.45 29.7025 0.1418 

24 24 0 0 0 
84 73.09 10.91 119.0281 1.6285 
410 459.09 | —49.09 | 2409.8281 5.2491 
340 285.45 54.55 2975.7025 10.4246 
180 174.55 5.45 29.7025 0.1702 

A 20 20 0 0 0 
50 60.91 | —10.91 119.0281 1.9542 
(O - Ey 


2; 
x = > —— * 32.630 
| | | | E | 


: Rejection 
; __ Tegion The graph at the left shows the location of the rejection region. Because 
x? © 32.630 is in the rejection region, you should decide to reject the null 
; a= ool hypothesis. 
2 Interpretation There is enough evidence at the 1% level of significance 
5 10 \15 2025 to conclude that the adults’ favorite ways to eat ice cream and gender are 
t= 13.277 dependent. 


> Try It Yourself 2 


The marketing consultant for a travel agency wants to determine whether 
travel concerns are related to travel purpose. The contingency table shows the 
results of a random sample of 300 travelers classified by their primary travel 
concern and travel purpose. At a = 0.01, can the consultant conclude that the 
travel concerns depend on the purpose of travel? (The expected frequencies 
are displayed in parentheses.) (Adapted from NPD Group for Embassy Suites) 


Business 36 (44.4) 108 (97.2) 14 (16.8) | 22 (21.6) | 180 
Leisure | 38 (29.6) 54(64.8) | 14(11.2) | 14(14.4) | 120 
Total | 74 | 162 | 28 | 36 | 300 | 


. Identify the claim and state Hy and H,. 

. Specify the level of significance a. 

. Determine the degrees of freedom. 

. Determine the critical value and the rejection region. 

. Use the observed and expected frequencies to find the chi-square test statistic. 
Sketch a graph. 

f. Decide whether to reject the null hypothesis. 

g. Interpret the decision in the context of the original claim. 


ce 


Answer: Page A46 
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TIl-83/84 PLUS 


y° -Test 
Observed: [A] 
Expected: [B] 
Calculate Draw 


TIl-83/84 PLUS 


x? -Test 


x? =3.493357223 
p=.321624691 
df=3 


TI-83/84 PLUS 


X%2=3.4934  p=.3216 


STUDY TIP 


You can also use a P-value to 
perform a chi-square test for 
independence. For instance, 
in Example 3, note that 
the TI-83/84 Plus displays 
P = .321624691. Because 
P > a, you should fail 
to reject the null 
hypothesis. 


EXAMPLE 3 


>» Using Technology for a Chi-Square Independence Test 


A health club manager wants to determine whether the number of days per 
week that college students spend exercising is related to gender. A random 
sample of 275 college students is selected and the results are classified as 
shown in the table. At a = 0.05, is there enough evidence to conclude that the 
number of days spent exercising per week is related to gender? 


Male 40 53 26 6 125 
Female 34 68 37 11 150 
Total 74 PAL 63 iy 275 


> Solution The null and alternative hypotheses can be stated as follows. 


Hy: The number of days spent exercising per week is independent of gender. 
H,: The number of days spent exercising per week depends on gender. (Claim) 


Using a TI-83/84 Plus, enter the observed frequencies into Matrix A and the 
expected frequencies into Matrix B, making sure that each expected frequency 
is greater than or equal to 5. To perform a chi-square independence test, 
begin with the STAT keystroke and choose the TESTS menu and select 
C: yx? — Test. Then set up the chi-square test as shown in the top-left screen. 

The other displays at the left show the results of selecting Calculate or 
Draw. Because d.f. = 3 and a = 0.05, the critical value is x = 7.815. So, 
the rejection region is y* > 7.815. The test statistic y* + 3.493 is not in the 
rejection region, so you should fail to reject the null hypothesis. 


Interpretation There is not enough evidence to conclude that the number of 
days spent exercising per week is related to gender. 


> Try It Yourself 3 


A researcher wants to determine if age is related to whether or not a tax cut 
would influence an adult to purchase a hybrid vehicle. A random sample of 
1250 adults is selected and the results are classified as shown in the table. At 
a = 0.01, is there enough evidence to conclude that the adults’ ages are 
related to the response? (Adapted from HNTB) 


Yes 257 189 143 589 
No 218 261 182 661 
Total 475 450 325 1250 


a. Identify the claim and state Ho and H,. 
b. Use a technology tool to enter the observed and expected frequencies into 
matrices. 
. Determine the critical value and the rejection region. 
. Use the technology tool to find the chi-square test statistic. 
. Decide whether to reject the null hypothesis. Use a graph if necessary. 
Interpret the decision in the context of the original claim. 
Answer: Page A46 


Poa 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain how to find the expected frequency for a cell in a contingency table. 


2. Explain the difference between marginal frequencies and joint frequencies 
in a contingency table. 


3. Explain how the chi-square test for independence and the chi-square 
goodness-of-fit test are similar. How are they different? 


4, Explain why the chi-square independence test is always a right-tailed test. 


True or False? In Exercises 5 and 6, determine whether the statement is true or 
false. If it is false, rewrite it as a true statement. 


5. If the two variables of the chi-square test for independence are dependent, 
then you can expect little difference between the observed frequencies and 
the expected frequencies. 


6. If the test statistic for the chi-square independence test is large, you will, in 
most cases, reject the null hypothesis. 


Finding Expected Frequencies Jn Exercises 7-12, (a) calculate the marginal 
frequencies, and (b) find the expected frequency for each cell in the contingency 
table. Assume that the variables are independent. 


ae 
Injury 18 22 
No injury 211 189 


Nausea 36 13 
No nausea 254 262 


Teller 92 351 50 
Customer service 76 42 8 
representative 

10. 
Seats 100 or fewer 182 203 165 
Seats over 100 180 311 159 
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11. 


| Female | 24 32 | 20 14 


Comedy ; 38 | 30 | 24 | 10 | 8 
| Action 15 17 16 9 5 
| Drama 12 u 19 25 13 


M@ USING AND INTERPRETING CONCEPTS 


Performing a Chi-Square Test for Independence In Exercises 13-22, 
perform the indicated chi-square test for independence by doing the following. 


(a) Identify the claim and state the null and alternative hypotheses. 


(b) Determine the degrees of freedom, find the critical value, and identify the 


rejection region. 


(c) Calculate the test statistic. If convenient, use technology. 


(d) Decide to reject or fail to reject the null hypothesis. Then interpret the decision 


13. 


14. 


in the context of the original claim. 


Achievement and School Location Is achieving a basic skill level in a 
subject related to the location of the school? The results of a random sample 
of students by the location of school and the number of those students 
achieving basic skill levels in three subjects is shown in the contingency 
table. At a = 0.01, test the hypothesis that the variables are independent. 
(Adapted from HUD State of the Cities Report) 


Urban (ss—(‘i‘i C8 
| Suburban ; 63 i 66 ie 65 


Attitudes about Safety The results of a random sample of students by type 
of school and their attitudes on safety steps taken by the school staff are 
shown in the contingency table. At a = 0.01, can you conclude that attitudes 
about the safety steps taken by the school staff are related to the type of 
school? (Adapted from Horatio Alger Association) 


Public 40 51 | 
Private 64 34 
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15. Trying to Quit Smoking The contingency table shows the number of times 
a random sample of former smokers tried to quit smoking before they were 
habit-free and gender. At a = 0.05, can you conclude that the number of 
times they tried to quit before they were habit-free is related to gender? 
(Adapted from Porter Novelli Health Styles for the American Lung Association) 


| Male 
Female 146 139 80 


16. Reviewing a Movie The contingency table shows how a random sample of 
adults rated a newly released movie and gender. At a = 0.05, can you 
conclude that the adults’ ratings are related to gender? 


Male 
| Female | 101 33 25 11 


17. Obsessive-Compulsive Disorder The results of a random sample of patients 
with obsessive-compulsive disorder treated with a drug or with a placebo are 
shown in the contingency table. At a = 0.10, can you conclude that the 
treatment is related to the result? On the basis of these results, would you 
recommend using the drug as part of a treatment for obsessive-compulsive 
disorder? (Adapted from The Journal of the American Medical Association) 


Improvement 


No change 54 70 


18. Musculoskeletal Injury The results of a random sample of children with 
pain from musculoskeletal injuries treated with acetaminophen, ibuprofen, 
or codeine are shown in the contingency table. At a = 0.10, can you conclude 
that the treatment is related to the result? (Adapted from American Academy 


of Pediatrics) 
| 58 | os | «0 | 


Slight improvement 42 19 39 


Significant improvement 
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19. 


20. 


21. 


22. 


Continuing Education You work for a college’s continuing education 
department and want to determine whether the reasons given by workers 
for continuing their education are related to job type. In your study, you 
randomly collect the data shown in the contingency table. At a = 0.01, can 
you conclude that the reason and the type of worker are dependent? How 
could you use this information in your marketing efforts? (Adapted from 
Market Research Institute for George Mason University) 


- Technical 30 36 41 
Other 47 25 30 


Ages and Goals You are investigating the relationship between the ages of 
US. adults and what aspect of career development they consider to be the 
most important. You randomly collect the data shown in the contingency 
table. At a = 0.01, is there enough evidence to conclude that age is related 
to which aspect of career development is considered to be most important? 
(Adapted from Harris Interactive) 


18-26 years | 31 22 21 
27-41 years 27 31 33 
42-61 years 19 14 8 


Vehicles and Crashes You work for an insurance company and are 
studying the relationship between types of crashes and the vehicles 
involved in passenger vehicle occupant deaths. As part of your study, you 
randomly select 4270 vehicle crashes and organize the resulting data as 
shown in the contingency table. At a = 0.05, can you conclude that the type 
of crash depends on the type of vehicle? (Adapted from Insurance Institute for 
Highway Safety) 


Single-vehicle 1237 547 479 
Multiple-vehicle 1453 || 307 247 


Library Internet Access Speed The contingency table shows a random 
sample of urban, suburban, and rural libraries and the speed of their Internet 
access. In the table, mbps represents megabits per second. At a = 0.01, can 
you conclude that the metropolitan status of libraries and Internet access 
speed are related? (Adapted from Center for Library and Information Innovation) 


1.4 mbps or less 5 20 58 
1.5 mbps — 3.0 mbps 24 46 65 
Greater than 3.0 mbps 37 59 64 
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SC] In Exercises 23 and 24, use StatCrunch to (a) find the marginal frequencies, 
(b) find the expected frequencies for each cell in the contingency table, and 
(c) perform the indicated chi-square test for independence. 


23. Financing and Education A financial aid officer is studying the relationship 
between family decisions to borrow money to finance their child’s education 
and their child’s expected income after graduation. As part of the study, 440 
families are randomly selected and the resulting data are organized as shown 
in the contingency table. At a = 0.01, can you conclude that the decision to 
borrow money is related to the child’s expected income after graduation? 
(Adapted from Sallie Mae, Inc.) 


Less than $35,000 37 10 22 25 
$35,000—$50,000 28 12 15 16 
$50,000—$100,000 55 9 65 48 
Greater than $100,000 | 36 1 29 32 


24. Alcohol-Related Accidents The contingency table shows the results of a 
random sample of fatally injured passenger vehicle drivers (with blood 
alcohol concentrations greater than or equal to 0.08) by age and gender. 
At a= 0.05, can you conclude that age is related to gender in such 
alcohol-related accidents? (Adapted from Insurance Institute for Highway Safety) 


Male  — 45 170 ~—-:90 72 | 45 26 
Female 9 30 21 17 10 5 


M@ EXTENDING CONCEPTS 


Homogeneity of Proportions Test Jn Exercises 25-28, use the following 
information. Another chi-square test that involves a contingency table is the 
homogeneity of proportions test. This test is used to determine if several 
proportions are equal when samples are taken from different populations. Before 
the populations are sampled and the contingency table is made, the sample sizes 
are determined. After randomly sampling different populations, you can test 
whether the proportion of elements in a category is the same for each population 
using the same guidelines in performing a chi-square independence test. The null 
and alternative hypotheses are always some variation of the following statements. 


Ho: The proportions are equal. 
H,: At least one of the proportions is different from the others. 
Performing a homogeneity of proportions test requires that the observed 


frequencies be obtained using a random sample, and each expected frequency must 
be greater than or equal to 5. 
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25. Motor Vehicle Crash Deaths The contingency table shows the results of a 


26. 


27. 


28. 


random sample of motor vehicle crash deaths by age and gender. At 
a = 0.05, perform a homogeneity of proportions test on the claim that the 
proportions of motor vehicle crash deaths involving males or females are the 
same for each age group. (Adapted from Insurance Institute for Highway Safety) 


Female 46 28 28 | 32 


| Male 56 31 | 2 | #14 
| Female 22 18 | 


Obsessive-Compulsive Disorder The contingency table shows the results of 
a random sample of patients with obsessive-compulsive disorder after being 
treated with a drug or with a placebo. At a = 0.10, perform a homogeneity 
of proportions test on the claim that the proportions of the results for drug 
and placebo treatments are the same. (Adapted from The Journal of the 
American Medical Association) 


Improvement 39 | 25 


No change | 34 | 70 


Is the chi-square homogeneity of proportions test a left-tailed, right-tailed, or 
two-tailed test? 


Explain how the chi-square test for independence is different from the 
chi-square homogeneity of proportions test. 


Contingency Tables and Relative Frequencies Jn Exercises 29-31, use 
the following information. 


The frequencies in a contingency table can be written as relative frequencies 
by dividing each frequency by the sample size. The contingency table below 
shows the number of U.S. adults (in millions) ages 25 and over by employment 
status and educational attainment. (Adapted from U.S. Census Bureau) 


Employed 10.8 35.9 22.3 56.9 
Unemployed 1.2 2.2 1.0 1.4 


Not in the labor force 14.3 23.1 10.5 | 16.7 
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29. Rewrite the contingency table using relative frequencies. 


30. What percent of U.S. adults ages 25 and over 


(a) have a degree and are unemployed? 


(b) have some college education, but no degree, and are not in the labor 
force? 


(c) are employed and high school graduates? 
(d) are not in the labor force? 
(e) are high school graduates? 


31. Explain why you cannot perform the chi-square independence test on these 


data. 


Conditional Relative Frequencies Jn Exercises 32-39, use the contingency 
table from Exercises 29-31, and the following information. 


Relative frequencies can also be calculated based on the row totals (by 
dividing each row entry by the row’s total) or the column totals (by dividing 
each column entry by the column’s total). These frequencies are conditional 
relative frequencies and can be used to determine if an association exists 
between two categories in a contingency table. 


32. Calculate the conditional relative frequencies in the contingency table based 


on the row totals. 


33. What percent of U.S. adults ages 25 and over who are employed have a 


34. 


35. 


36. 


37. 


38. 


39. 


degree? 


What percent of U.S. adults ages 25 and over who are not in the labor force 
have some college education, but no degree? 


Calculate the conditional relative frequencies in the contingency table based 
on the column totals. 


What percent of U.S. adults ages 25 and over who have a degree are not in 
the labor force? 


What percent of U.S. adults ages 25 and over who are not high school 
graduates are unemployed? 


Use your results from Exercise 35 to construct a bar graph that shows the 
percentages of U.S. adults ages 25 and over based on employment status. 
Each category of employment status will have four bars, representing the 
four levels of educational attainment mentioned in the contingency table. 


What conclusions can you make from the bar graph you constructed in 
Exercise 38? 
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Fast Food Survey 


With the growing trend toward healthier eating, fast food chains are revising their menus. Some chains 
have added healthier options, such as salads, while other chains are grilling foods instead of frying them. 
QSR Magazine conducted a recent survey of 673 U.S. consumers regarding their attitudes and 


preferences about fast food. 


One question in the survey asks: 


Do you agree that, on the whole, fast food menus have gotten healthier over the past 3 years? 


The pie chart shows the response to the question on a national level. The contingency table shows the 


results classified by gender and response. 


Are Fast Food Gender 
Menus Healthier? L —— 
Disagree 9% Response Female Male 
Strongly Somewhat agree 286 114 
agree 
15% Neither agree nor disagree 76 58 
Strongly agree 62 19 
Disagree 34 24 
Neither 
agree nor 
disagree Somewhat agree 
20% 59% 
M@ EXERCISES 


1. Assuming the variables gender and response 
are independent, did female respondents or 
male respondents exceed the expected 
number of “somewhat agree” responses? 


2. Assuming the variables gender and response 
are independent, did female respondents or 
male respondents exceed the expected number 
of “neither agree nor disagree” responses? 


3. At a = 0.01, perform a chi-square indepen- 
dence test to determine whether the variables 
response and gender are independent. What 
can you conclude? 


In Exercises 4 and 5, perform a chi-square 
goodness-of-fit test to compare the national 
distribution of responses with the distribution of 
each gender. Use the national distribution as the 
claimed distribution. Use a = 0.05. 


4. Compare the distribution of responses by 


females with the national distribution. What 
can you conclude? 


5. Compare the distribution of responses by 


males with the national distribution. What can 
you conclude? 


6. In addition to the variables used in the Case 


Study, what other variables do you think are 
important to consider when studying the 
distribution of U.S. consumers’ attitudes about 
healthy fast food? 
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Comparing Two Variances 


WHAT YOU SHOULD LEARN The F-Distribution >» The Two-Sample F-Test for Variances 


> How to interpret the > THE F-DISTRIBUTION 


F-distribution and use an In Chapter 8, you learned how to perform hypothesis tests to compare 
F-table to find critical values population means and population proportions. Recall from Section 8.2 that the 
t-test for the difference between two population means depends on whether the 
population variances are equal. To determine whether the population variances 
are equal, you can perform a two-sample F-test. 

In this section, you will learn about the F-distribution and how it can be used 
to compare two variances. 


DEFINITION 


Let st and s3 represent the sample variances of two different populations. 
If both populations are normal and the population variances 07 and o% are 
equal, then the sampling distribution of 


>» How to perform a two-sample 
F-test to compare two 
variances 


is called an F-distribution. Several properties of the F-distribution are as 
follows. 


1. The F-distribution is a family of curves each of which is determined by two 
types of degrees of freedom: the degrees of freedom corresponding to the 
variance in the numerator, denoted by d.f.j, and the degrees of freedom 
corresponding to the variance in the denominator, denoted by d.f.p. 


. F-distributions are positively skewed. 
. The total area under each curve of an F-distribution is equal to 1. 
. F-values are always greater than or equal to 0. 


nN & WwW N 


. For all F-distributions, the mean value of F is approximately equal to 1. 


A 


df= 1 and df. =8 
dfy = 8 and df, = 26 
é N D 


d.f.y = 16 and d.f.y =7 


df, = 3 and df.) = 11 


t 
1 2 3 4 


F-Distributions 
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Table 7 in Appendix B lists the critical values for the F-distribution for 
selected levels of significance a and degrees of freedom d.f.y and d.f.p. 


In the sampling distribution Finding Critical Values for the F-Distribution 
ae a “dhe liven erianes 1. Specify the level of significance a. 

2 
is always in the numerator. 
So, F is always greater 
than or equal to 1. As 
such, all one-tailed tests 
are right-tailed tests, and 
for all two-tailed tests, 
you need only to find the 
right-tailed critical value. 


2. Determine the degrees of freedom for the numerator d.f.y. 
3. Determine the degrees of freedom for the denominator d.f.p. 
4 


. Use Table 7 in Appendix B to find the critical value. If the hypothesis 
test is 


a. one-tailed, use the a F-table. 
b. two-tailed, use the Sav F-table. 


EXAMPLE 1 


> Finding Critical F-Values for a Right-Tailed Test 


Find the critical F-value for a right-tailed test when a = 0.10, df.y = 5, and 
d.fip = 28. 


> Solution 


A portion of Table 7 is shown below. Using the a = 0.10 F-table with d.f.y = 5 
and d.f.p = 28, you can find the critical value, as shown by the highlighted 
areas in the table. 


af: a = 0.10 
Degrees of d.f.,y: Degrees of freedom, numerator 
freedom, 
denominator 1 2 3 __ 6 7 8 
1 39.86 49.50 53.59 58.20 58.91 59.44 
2 8.53 9.00 9.16 Oy} S35) 87/ 
A 
26 291 2.01 1.96 1.92 
2.00 1.95 1.91 


27 2.90 


28 2.89 2.00 1.94 1.90 
29 2.89 : 1.99 1.93 1.89 
30 288 249 2.28 2.14 2.05 1.98 1.93 1.88 


From the table, you can see that the critical value is Fo = 2.06. The graph 
at the left shows the F-distribution for a = 0.10, d.f.y = 5, df.p = 28, and 
Fo = 2.06. 


> Try It Yourself 1 


Find the critical F-value for a right-tailed test when a = 0.05, df.y = 8, and 
d.f.p = 20. 


a. Specify the level of significance a. 
b. Use Table 7 in Appendix B to find the critical value. Answer: Page A46 
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When performing a two-tailed hypothesis test using the F-distribution, you need 
STUDY TIP only to find the right-tailed critical value. You must, however, remember to use 
When using Table 7 in Appendix B the Sa F-table. 
to find a critical value, you will 
notice that some of the values for 
d.f.y or d.f.p are not included in EXAMPLE 2 


the table. If d.f.y or d.f.p is exactly 


midway between two values in > Finding Critical F-Values for a Two-Tailed Test 

testable, themuse the critical Find the critical F-value for a two-tailed test when a = 0.05, d.f.y = 4, and 
value midway between dt = 2 

the corresponding critical one 

values. In some cases, > Solution 

though, it is easier to : . . 

use a technology tool A portion of Table 7 is shown below. Using the 

to calculate the P-value, 1 1 

compare it to the level 3a = 73 (0.05) = 0.025 


of significance, and then 
decide whether to reject 


F-table with df.y = 4, and d.f.p = 8, you can find the critical value, as shown 
the null hypothesis. 


by the highlighted areas in the table. 


af: a = 0.025 
Degrees of 
freedom, 

denominator 1 2 3 Em s 6 7 8 
647.8 799.55 864.2 899.6 921.8 937.1 948.2 956.7 
3851 39100) 39:17 39,30) SE.) SISK SISL5I7/ 
1744 1604 1544 1 14.88 14.73 14.62 14.54 
12.22 10.65 9.98 O5 8) S07 Bee 
10.01 843 7.76 7.15 698 685 6.76 
8.81 7.26 6.60 SS) SS? 50 Seo 
8.07 654 5.89 5.29 5.12 499 4.90 
757 6.06 [Meo 482 465 453 443 
7.21 5/1 5.08 472 448 432 4.20 4.10 


d.f.y: Degrees of freedom, numerator 


From the table, the critical value is Ay) =5.05. The graph shows the 
F-distribution for 5a = 0.025, df.y = 4, d.f.p = 8, and Fy = 5.05. 


A 


ie 
z= 0.025 


> Try It Yourself 2 
Find the critical F-value for a two-tailed test when a = 0.01, d.f.y = 2, and 


a. Specify the level of significance a. 
b. Use Table 7 in Appendix B with Sav to find the critical value. 
Answer: Page A46 
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> THE TWO-SAMPLE F-TEST FOR VARIANCES 
In the remainder of this section, you will learn how to perform a two-sample 
F-test for comparing two population variances using a sample from each 
population. Such a test has three conditions that must be met. 

1. The samples must be randomly selected. 

2. The samples must be independent. 

3. Each population must have a normal distribution. 


If these requirements are met, you can use the F-test to compare the population 
variances a7 and 03. 


TWO-SAMPLE F-TEST FOR VARIANCES 


A two-sample F-test is used to compare two population variances 07 and 04 
when a sample is randomly selected from each population. The 
populations must be independent and normally distributed. The test statistic is 


where sj and s3 represent the sample variances with s} = s3. The numerator 
has df.y=n,—-1 degrees of freedom and the denominator has 
d.f.p = nm, — 1 degrees of freedom, where 7, is the size of the sample having 
variance sj and nz is the size of the sample having variance 59. 


GUIDELINES 


Using a Two-Sample F-Test to Compare oi and a3 


IN WORDS IN SYMBOLS 
1. Identify the claim. State the null and State Hp and H,. 
alternative hypotheses. 
2. Specify the level of significance. Identify a. 
3. Determine the degrees of freedom. df.y =n, —-1 
d.f.p =f > 1 
4. Determine the critical value. Use Table 7 in 
Appendix B. 
5. Determine the rejection region. 
2 
6. Find the test statistic and sketch k= = 
the sampling distribution. 82 
7. Make a decision to reject or fail to reject If F is in the rejection 
the null hypothesis. region, reject Hp. 
Otherwise, fail to 
reject A. 


8. Interpret the decision in the context of 
the original claim. 
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EXAMPLE 3 


Does location have an effect on > Performing a Two-Sample F-Test 


the variance of real estate selling A restaurant manager is designing a system that is intended to decrease the 
prices? A random sample of variance of the time customers wait before their meals are served. Under the 
selling prices (in thousands of old system, a random sample of 10 customers had a variance of 400. Under the 
dollars) of condominiums sold new system, a random sample of 21 customers had a variance of 256. At 
in south Florida is shown in a = 0.10, is there enough evidence to convince the manager to switch to the 


the table. The first column new system? Assume both populations are normally distributed. 
represents the selling prices of 


condominiums in Miami, and the > Solution Because 400 > 256, s? = 400, and s3 = 256. Therefore, s7 
second column lists the selling and oj represent the sample and population variances for the old system, 
prices of condominiums in Fort respectively. With the claim “the variance of the waiting times under the new 


Lauderdale. (Adapted from Florida : : +s : ” 
Fediton © andiiis linverataet Horde system is less than the variance of the waiting times under the old system,” the 


Bergstrom Center for Real Estate Studies) null and alternative hypotheses are 


Ho: of = 03 and H,: 04, > 0%. (Claim) 
ea eee Because the test is a right-tailed test with a= 0.10, df.y =m, —1= 


139.0 85.5 10-1 = 9, and d.f.p = m — 1 = 21 — 1 = 20, the critical value is Fo = 1.96. 
138.8 80.9 So, the rejection region is F > 1.96. With the F-test, the test statistic is 


2 
135.5 91.2 Ot 2 ON ae 

sf (256 
150.9 70 52 
155.0 78.0 The graph shows the location of the rejection region and the test statistic. Because 
154.7 69.9 F is not in the rejection region, you should fail to reject the null hypothesis. 
149.9 70.5 A 
150.5 73.6 
134.5 105.9 
125.0 70.0 


Assuming each population 
of selling prices is normally a=0.10 
distributed, is it possible to 
use a two-sample F-test to 
compare the population F~1.56 Fy = 1.96 
variances? 


Interpretation There is not enough evidence at the 10% level of significance 
to convince the manager to switch to the new system. 


> Try It Yourself 3 


A medical researcher claims that a specially treated intravenous solution 
decreases the variance of the time required for nutrients to enter the 
bloodstream. Independent samples from each type of solution are randomly 
selected, and the results are shown in the table at the left. At a = 0.01, is there 


enough evidence to support the researcher’s claim? Assume the populations 
are normally distributed. 


a. Identify the claim and state Hp and H,. 
s? = 180 | s? = 56 b. Specify the level of significance a. 
c. Determine the degrees of freedom for the numerator and for the denominator. 
d. Determine the critical value and the rejection region. 
e. Use the F-test to find the test statistic F. Sketch a graph. 
f. Decide whether to reject the null hypothesis. 
g. Interpret the decision in the context of the original claim. 
Answer: Page A46 
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EXAMPLE 4 [MRSC Beemng 


» Using Technology for a Two-Sample F-Test 


| Stock A Stock B You want to purchase stock in a company and are deciding between two 
different stocks. Because a stock’s risk can be associated with the standard 

m= 30 | m= 31 deviation of its daily closing prices, you randomly select samples of the daily 
52 =3.5 5, = 5.7 closing prices for each stock to obtain the results shown at the left. At 


a = 0.05, can you conclude that one of the two stocks is a riskier investment? 
Assume the stock closing prices are normally distributed. 


> Solution 


Because 5.7? > 3.5*, sj = 5.7°, and s3} = 3.5. Therefore, st and o7 represent 
the sample and population variances for Stock B, respectively. With the 
claim “one of the two stocks is a riskier investment,” the null and alternative 
hypotheses are 


STUDY TIP Hg: af = 6% and H,: 0% # 0%. (Claim) 
Newest also use a P-value to Because the test is a two-tailed test with ja = 3(0.05) = 0.025, d.f.y = 
pClateltiachsilerss Ul ole ese n, — 1 = 31 — 1 = 30, and dip = ny — 1 = 30 — 1 = 29, the critical value 


For instance, in Example 4, 


note that the TI-83/84 Plus is Fy = 2.09. So, the rejection region is F > 2.09. 
displays P = .0102172459. To perform a two-sample F-test using a TI-83/84 Plus, begin with the STAT 
Because P < a, you t keystroke. Choose the TESTS menu and select D:2—SampFTest. Then set up 
should reject the null , the two-sample F-test as shown in the first screen below. Because you are 
hypothesis. entering the descriptive statistics, select the Stats input option. When entering 
, the original data, select the Data input option. The other displays below show 
the results of selecting Calculate or Draw. 
TI-83/84 PLUS TI-83/84 PLUS TI-83/84 PLUS 
Calculate Ora : P=.0102 
The test statistic F ~ 2.652 is in the rejection region, so you should reject the 
null hypothesis. 
Interpretation There is enough evidence at the 5% level of significance to 
support the claim that one of the two stocks is a riskier investment. 
> Try It Yourself 4 
Location A Location B A biologist claims that the pH levels of the soil in two geographic locations 
have equal standard deviations. Independent samples from each location are 
n= 16 n= 22 randomly selected, and the results are shown at the left. At a = 0.01, is there 
s = 0.95 s = 0.78 enough evidence to reject the biologist’s claim? Assume the pH levels are 


normally distributed. 


. Identify the claim and state Hy and H,. 

. Specify the level of significance a. 

. Determine the degrees of freedom for the numerator and for the denominator. 
. Determine the critical value and the rejection region. 

. Use a technology tool to find the test statistic F. 

Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A46 
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BUILDING BASIC SKILLS AND VOCABULARY 


1. Explain how to find the critical value for an F-test. 
2. List five properties of the F-distribution. 
3. List the three conditions that must be met in order to use a two-sample F-test. 
4. 


Explain how to determine the values of d.f.y and d.f.5 when performing a 
two-sample F-test. 


In Exercises 5-8, find the critical F-value for a right-tailed test using the indicated 
level of significance a and degrees of freedom d.f.n and af. p. 


5. a = 0.05, dfn = 9,d.£.p = 16 6. a = 0.01, d-f.y = 2,d-f.p = 11 
7. a = 010, dfy = 10,df£p = 15 8. a = 0.025,d.fy = 7, d-f.p = 3 


In Exercises 9-12, find the critical F-value for a two-tailed test using the indicated 
level of significance a and degrees of freedom d.f.y and df. p. 


9. a = 0.01, dfn = 6,df£p =7 10. a = 0.10, dfn = 24, dfp = 28 
11. a = 0.05, dfx = 60, d.f.p = 40 12. a = 0.05, d.£.y = 27, d-f£.p = 19 


In Exercises 13-18, test the claim about the difference between two population 
variances oj and o% at the given level of significance a using the given sample 
statistics. Assume the sample statistics are from independent samples that are 
randomly selected and each population has a normal distribution. 


13. Claim: of > 03; a = 0.10. 14. Claim: of = 03; a = 0.05. 
Sample statistics: 57 = 773, Sample statistics: s7:= 310, 
Ay = 5:85 = 765, na = 6 Ay = Tish = 297, nm = 8 

15. Claim: of = 03; a = 0.01. 16. Claim: of 4 03; a = 0.05. 
Sample statistics: sj = 842, Sample statistics: sj = 245, 
ny = 11; 53 = 836, my = 10 ny = 31:85 = 112, ny = 28 

17. Claim: of = 03; a = 0.01. 18. Claim: of > 03; a = 0.05. 
Sample statistics: = 98. Sample statistics: s7 = 44.6, 
Ay = 13:55 = 2.5, = 20 Ay = 16:8, = 39.3, m = 12 


Mi USING AND INTERPRETING CONCEPTS 


Comparing Two Variances Jn Exercises 19-26, (a) identify the claim and 
state Hy and H,, (b) determine the critical value and the rejection region, (c) find 
the test statistic F (d) decide whether to reject or fail to reject the null hypothesis, 
and (e) interpret the decision in the context of the original claim. If convenient, use 
technology to solve the problem. In each exercise, assume the samples are 
independent and each population has a normal distribution. 


19. Life of Appliances Company A claims that the variance of the life of its 
appliances is less than the variance of the life of Company B’s appliances. A 
random sample of the lives of 20 of Company A’s appliances has a variance 
of 2.6. A random sample of the lives of 25 of Company B’s appliances has a 
variance of 2.8. At a = 0.05, can you support Company A’s claim? 
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20. Fuel Consumption An automobile manufacturer claims that the variance of 
the fuel consumption for its hybrid vehicles is less than the variance of the fuel 
consumption for the hybrid vehicles of a top competitor. A random sample of 
the fuel consumption of 19 of the manufacturer’s hybrids has a variance of 0.24. 
A random sample of the fuel consumption of 21 of its competitor’s hybrids has 


“Company A Company B a variance of 0.77. At a = 0.01, can you support the manufacturer’s claim? 


250 350 450 400 350 (Adapted from GreenHybrid) 


650 550 250 350 190 21. Home Theater Prices The table shows the prices (in dollars) for a random 

285 550 sample of home theater systems. At a = 0.05, can you conclude that the 

TABLE FOR EXERCISE 21 variances of the prices differ between the two companies? (Adapted from 
Best Buy) 

 BrandA Brand Bo * 22. Ice Cream and Calories The table shows the numbers of calories in a 

serving for a random sample of ice cream flavors for two brands. At 

150 170 140 270 250 300 a = 0.10, can you conclude that the variances of the numbers of calories 


160 140 130 250 290 280 
160 160 130 240 310 240 
180 330 


TABLE FOR EXERCISE 22 


differ between the two brands? (Source: Perry’s Ice Cream and Ben & Jerry’s 
Homemade, Inc.) 


23. Science Assessment Tests In a recent interview, a state school administrator 
stated that the standard deviations of science assessment test scores for 
eighth grade students are the same in Districts 1 and 2. A random sample of 
12 test scores from District 1 has a standard deviation of 36.8 points, and a 
random sample of 14 test scores from District 2 has a standard deviation of 
32.5 points. At a = 0.10, can you reject the administrator’s claim? (Adapted 


from National Center for Educational Statistics) 


24. U.S. History Assessment Tests A school administrator reports that the 
standard deviations of U.S. history assessment test scores for eighth grade 
students are the same in Districts 1 and 2. As proof, the administrator gives the 
results of a study of test scores in each district. The study shows that a random 
sample of 10 test scores from District 1 has a standard deviation of 33.9 points, 
and a random sample of 13 test scores from District 2 has a standard deviation 
of 30.2 points. At a = 0.01, can you reject the administrator’s claim? (Adapied 


from National Center for Educational Statistics) 


25. Annual Salaries The annual salaries for a random sample of 16 actuaries 
working in New York have a standard deviation of $14,900. The annual 
salaries for a random sample of 17 actuaries working in California have a 
standard deviation of $9600. At a = 0.05, can you conclude that the standard 
deviation of the annual salaries for actuaries is greater in New York than in 


California? (Adapted from America’s Career InfoNet) 


26 


Annual Salaries An employment information service claims the standard 
deviation of the annual salaries for public relations managers is greater in 
Florida than in Louisiana. The annual salaries for a random sample of 28 
public relations managers in Florida have a standard deviation of $10,100. 
The annual salaries for a random sample of 24 public relations managers in 
Louisiana have a standard deviation of $6400. At a = 0.05, can you support 
the service’s claim? (Adapted from America’s Career InfoNet) 


In Exercises 27-30, use StatCrunch to test the claim about the difference 
between two population variances 07 and 0% at the given level of significance a 
using the given sample statistics. Assume the sample statistics are from independent 
samples that are randomly selected and each population has a normal distribution. 


27. Claim: 07 = 03; a = 0.10. 28. Claim: 07 # 03; a = 0.05. 
Sample statistics: sj = 156.25, Sample statistics: sj = 31.36, 
ny = 15; 53 = 295.84, n. = 18 ny = 24; 53 = 11.56, np = 20 
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29. Claim: 07 = 03; a = 0.05. 30. Claim: of > 03; a = 0.01. 
Sample statistics: sj = 416.16, Sample statistics: sj = 828, 
ny = 22: 53 = 193.21, nn, = 29 ny = 7:85 = 697, n, = 13 


mM EXTENDING CONCEPTS 


Finding Left-Tailed Critical F-Values In this section you learned that if s7 is 
larger than s3, then you only need to calculate the right-tailed critical F-value for a 
two-tailed test. For other applications of the F-distribution, you will need to 
calculate the left-tailed critical F-value. To calculate the left-tailed critical F-value, 
do the following. 


(1) Interchange the values for d.f.y and d.f.p. 

(2) Find the corresponding F-value in Table 7. 

(3) Calculate the reciprocal of the F-value to obtain the left-tailed critical F-value. 
In Exercises 31 and 32, find the right- and left-tailed critical F-values for a two-tailed 
test with the given values of a, d.f.y, and d.f.p. 

31. a = 0.05, dfn = 6,df.p = 3 32. a = 0.10, d-f£y = 20, d-f£.p = 15 
Confidence Interval for o3/03 When sj and s3 are the variances of 


randomly selected, independent samples from normally distributed populations, 
then a confidence interval for 07/03 is 


2 2 2 
S] Oj S] 
it <3 < Fr 
52 02 52 


where F;, is the left-tailed critical F-value and Fp is the right-tailed critical F-value. 


In Exercises 33 and 34, construct the indicated confidence interval for 07/03. 
Assume the samples are independent and each population has a normal 
distribution. 


33. Cholesterol Contents In a 
recent study of the cholesterol 
contents of grilled chicken 
sandwiches served at fast 


Restaurant | Burger King § McDonald’s 


food restaurants, a nutritionist Sample st = 10.89 55 = 9.61 
found that random samples of veTIanee 
sandwiches from Burger King Sample size n, = 16 ny = 12 


and from McDonald’s had the 

sample statistics shown in the table. Construct a 95% confidence interval for 
oi/a3, where oj and o% are the variances of the cholesterol contents of 
grilled chicken sandwiches from Burger King and McDonald’s, respectively. 
(Adapted from Burger King Brands, Inc. and McDonald’s Corporation) 


34. Carbohydrate Contents A fast food study found that the carbohydrate 
contents of 16 randomly selected grilled chicken sandwiches from Burger King 
had a variance of 5.29. The study also found that the carbohydrate contents 
of 12 randomly selected grilled chicken sandwiches from McDonald’s had a 
variance of 3.61. Construct a 95% confidence interval for o{/03, where oj 


and o% are the variances of the carbohydrate contents of grilled chicken 
sandwiches from Burger King and McDonald’s, respectively. (Adapted from 
Burger King Brands, Inc. and McDonald’s Corporation) 
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WHAT YOU SHOULD LEARN 


>» How to use one-way analysis 
of variance to test claims 
involving three or more means 


» An introduction to two-way 
analysis of variance 


Analysis of Variance 


One-Way ANOVA > Two-Way ANOVA 
>» ONE-WAY ANOVA 


Suppose a medical researcher is analyzing the effectiveness of three types of pain 
relievers and wants to determine whether there is a difference in the mean 
lengths of time it takes the three medications to provide relief. To determine 
whether such a difference exists, the researcher can use the F-distribution 
together with a technique called analysis of variance. Because one independent 
variable is being studied, the process is called one-way analysis of variance. 


DEFINITION 


One-way analysis of variance is a hypothesis-testing technique that is used to 
compare the means of three or more populations. Analysis of variance is 
usually abbreviated as ANOVA. 


To begin a one-way analysis of variance test, you should first state the null 
and alternative hypotheses. For a one-way ANOVA test, the null and alternative 
hypotheses are always similar to the following statements. 


Ao: fy = Bo = M3 =*** = py (All population means are equal.) 

H,; At least one mean is different from the others. 

When you reject the null hypothesis in an ANOVA test, you can conclude 
that at least one of the means is different from the others. Without performing 
more statistical tests, however, you cannot determine which of the means is 
different. 


In a one-way ANOVA test, the following conditions must be true. 


1. Each sample must be randomly selected from a normal, or approximately 
normal, population. 

2. The samples must be independent of each other. 

3. Each population must have the same variance. 


The test statistic for a one-way ANOVA test is the ratio of two variances: the 
variance between samples and the variance within samples. 


Variance between samples 


Test statistic = : =e 
Variance within samples 


1. The variance between samples MSz measures the differences related to the 
treatment given to each sample and is sometimes called the mean square 
between. 


2. The variance within samples MSy measures the differences related to entries 
within the same sample. This variance, sometimes called the mean square 
within, is usually due to sampling error. 
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The notations n;, X;, and s? 
represent the sample size, 
mean, and variance of the 
ith sample, respectively. 
X is sometimes called 
the grand mean. 
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ONE-WAY ANALYSIS OF VARIANCE TEST 


If the conditions for a one-way analysis of variance test are satisfied, then the 
sampling distribution for the test is approximated by the F-distribution. The 
test statistic is 


MS 
The degrees of freedom for the F-test are 
dfiy=k-1 
and 
dfp=N—k 


where k is the number of samples and N is the sum of the sample sizes. 


If there is little or no difference between the means, then MS z will be 
approximately equal to MSy and the test statistic will be approximately 1. 
Values of F close to 1 suggest that you should fail to reject the null hypothesis. 
However, if one of the means differs significantly from the others, MS, will be 
greater than MSy and the test statistic will be greater than 1. Values of F 
significantly greater than 1 suggest that you should reject the null hypothesis. As 
such, all one-way ANOVA tests are right-tailed tests. That is, if the test statistic is 
greater than the critical value, Hp will be rejected. 


GUIDELINES 


Finding the Test Statistic for a One-Way ANOVA Test 


IN WORDS IN SYMBOLS 
d(x — x) 
1. Find the mean and variance of x= =x = ( = ) 
each sample. iG if 
2. Find the mean of all entries in r= x 


all samples (the grand mean). 


3. Find the sum of squares between SS = Enj(¥; = x)? 
the samples. 

4. Find the sum of squares within 
the samples. 


SSw = d(n; — 1)s? 


Sp WAT 7 oe 


5. Find the variance between the MS3 = = 
d.f.y le = il 
samples. 
SS =(n; — 1)s? 
6. Find the variance within the MSy = v= ( ) 
d.f.p IN = Ik 
samples. 
7. Find th isti Poe 
. Find the test statistic. ai Sw 


Note that in Step 1, you are summing the values from just one sample. In Step 2, 
you are summing the values from all of the samples. 
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The notation $$; represents the sum of squares between the samples. 


SSp = ny(X, _ og + no( Xp _ ae op dees 


- Enix; = =) 


+ ny Xp _ x 


The notation SSy represents the sum of squares within the samples. 


SSw = (ny - Lys} + (Ny — 1)s3 + 
= 3; - 1)s7 


GUIDELINES 


Performing a One-Way Analysis of Variance Test 


IN WORDS 


1. Identify the claim. State the null and 
alternative hypotheses. 


2. Specify the level of significance. 


3. Determine the degrees of freedom. 


4. Determine the critical value. 


5. Determine the rejection region. 


6. Find the test statistic and sketch 
the sampling distribution. 


7. Make a decision to reject or fail to reject 


the null hypothesis. 


8. Interpret the decision in the context of 


the original claim. 


Tables are a convenient way to summarize the results of a one-way analysis 


“+ (m= sh 


IN SYMBOLS 
State Hp and H,. 


Identify a. 
dfiy=k-1 
dfp=N—k 


Use Table 7 in 
Appendix B. 


_ MS; 
~ MSy 


If F is in the rejec- 
tion region, reject 

HA. Otherwise, fail 
to reject Hp. 


of variance test. ANOVA summary tables are set up as shown below. 


ANOVA Summary Table 
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EXAMPLE 1 


> Performing a One-Way ANOVA Test 


A medical researcher wants to determine whether there is a difference in the 
mean lengths of time it takes three types of pain relievers to provide relief 
from headache pain. Several headache sufferers are randomly selected and 
given one of the three medications. Each headache sufferer records the time (in 
minutes) it takes the medication to begin working. The results are shown in the 
table. At a = 0.01, can you conclude that at least one mean time is different 
from the others? Assume that each population of relief times is normally 
distributed and that the population variances are equal. 


12 16 14 
15 14 17 
17 21 20 
12 15 15 
19 
n,=4 ny =5 nz = 4 
m= P=14 | m= P=17 | =F = 165 
st = 6 33 = 8.5 s3=7 


> Solution The null and alternative hypotheses are as follows. 


Ho: fy = M2 = b3 

H,: At least one mean is different from the others. (Claim) 
Because there are k = 3 samples, d.f.y = k — 1 = 3 — 1 = 2.Thesum of the 
sample sizes is N =n, + m7 +n3=4+5+4=13. So, dfip=N-—k= 
13 — 3 = 10. Using d.f.y = 2, dfp = 10, and a = 0.01, the critical value is 
Fy = 7.56. To find the test statistic, first calculate x, MSp, and MSy. 


= Bx  56+85 +66 _ 
rT, a = 15.92 
SS_ _ Sni(%i — XP 
tei dfy  k-1 
4(14 — 15.92) + 5(17 — 15.92)* + 4(16.5 — 15.92)? 
3-1 
oe = 1p a6i6 
2 
SS >d(n; — 1)s? 
eee 


dfp N-k 


ee) Be, 
13 10” 


Using MS, ~ 10.9616 and MSy = 7.3, the test statistic is 


MSz 10.9616 
Fe= MS, ~ 73 1.50. 
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A researcher wants to determine 
whether there is a difference in 
the mean lengths of time wasted 
at work for people in California, 
Georgia, and Pennsylvania. 
Several people from each state 
who work 8-hour days are 
randomly selected and they are 
asked how much time (in hours) 
they waste at work each day. 
The results are shown in the 
table. (Adapted from America Online and 
Salary.com) 


2.5 
1.75 
1.5 
2.29: 


At a = 0.10, can the researcher 
conclude that there is a 
difference in the mean lengths 
of time wasted at work among 
the states? Assume that each 
population is normally 
distributed and that the 
population variances are equal. 
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The graph shows the location of the 
rejection region and the test statistic. 
Because F is not in the rejection 
region, you should fail to reject the null 
hypothesis. 


Interpretation There is not enough 
evidence at the 1% level of significance 
to conclude that there is a difference in 
the mean length of time it takes the 
three pain relievers to provide relief 
from headache pain. 


The ANOVA summary table for Example 1 is shown below. 


21.9232 10.9616 
73 10 V3 


1.50 


> Try It Yourself 1 


A sales analyst wants to determine whether there is a difference in the mean 
monthly sales of a company’s four sales regions. Several salespersons from 
each region are randomly selected and they provide their sales amounts (in 
thousands of dollars) for the previous month. The results are shown in the 
table. At a = 0.05, can the analyst conclude that there is a difference in the 
mean monthly sales among the sales regions? Assume that each population of 
sales is normally distributed and that the population variances are equal. 


34 47 40 21 
28 36 30 30 
18 30 41 24 
24 38 29 37 
44 23 
ny=4 ny =5 nz=4 ng=5 
X, = 26 X, = 39 Xx; = 35 X4 = 27 


sp © 45.33 sh = 45 s3 © 40.67 sq = 42.5 


Identify the claim and state Hy and H,. 
. Specify the level of significance a. 
Determine the degrees of freedom for the numerator and for the denominator. 
. Determine the critical value and the rejection region. 
Find the test statistic F. Sketch a graph. 
Decide whether to reject the null hypothesis. 
Interpret the decision in the context of the original claim. 
Answer: Page A46 


emo eao se 
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55 110 
5 45 
45 45 
40 40 
28 33 
27 30 
25 30 
25 22 
17 15 
15 10 
12 
10 
8 
7 
6 

STUDY TIP 


Here are instructions for 
performing a one-way analysis 
of variance test on a TI-83/84 Plus. 


110 
87 
70 
65 
52 
35 
35 
30 
25 
18 
14 


Begin by storing the data in 


List 1, List 2, and so on, 
depending on the data. 


STAT 


Choose the TESTS menu. 


F: ANOVA( 


Then enter L1, L2, and 
so on, separated by 


commas. 
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Using technology greatly simplifies the one-way ANOVA process. When 
using a technology tool such as Excel, MINITAB, or the TI-83/84 Plus to perform 
a one-way analysis of variance test, you can use P-values to decide whether to 
reject the null hypothesis. If the P-value is less than a, you should reject Hp. 


EXAMPLE 2 G® Report 49 


» Using Technology to Perform ANOVA Tests 


A researcher believes that the mean earnings of top-paid actors, athletes, and 
musicians are the same. The earnings (in millions of dollars) for several 
randomly selected people from each category are shown in the table at the 
left. Assume that the populations are normally distributed, the samples are 
independent, and the population variances are equal. At a = 0.10, can you 
reject the claim that the mean earnings are the same for the three categories? 
Use a technology tool. (Source: Forbes.com LLC) 


> Solution § The null and alternative hypotheses are as follows. 
Aly: fy = M2 = Bg (Claim) 
H,: At least one mean is different from the others. 


The results obtained by performing the test on a TI-83/84 Plus are shown 
below. From the results, you can see that P ~ 0.06. Because P < a, you 
should reject the null hypothesis. 


TI-83/84 PLUS TI-83/84 PLUS 


One-way ANOVA One-way ANOVA 
F=3.051 763393 7? MS=1883.18182 
p=.0608019049 Error 
Factor df=33 

df=2 SS=20363.6364 
SS=3766.36364 MS=617.07989 
J MS=1883.18182 Sxp=24.8410928 


Interpretation ‘There is enough evidence at the 10% level of significance to 
reject the claim that the mean earnings are the same. 
> Try It Yourself 2 


The data shown in the table represent the GPAs of randomly selected freshmen, 
sophomores, juniors, and seniors. At a = 0.05, can you conclude that there is 
a difference in the means of the GPAs? Assume that the populations of GPAs 
are normally distributed and that the population variances are equal. Use a 
technology tool. 


2.34 2.38 3.31 2.39 | 3.40 | 2.70 2.34 
3.26 2.22 3.26 3.29 | 2.95 | 3.01 | 3.13 | 3.59 | 2.84 | 3.00 
2.80 | 2.60 2.49 2.83 2.34 3.23 3.49 3.03 2.87 


3:31 


2.35 3.27 2.86 2.78 2.75 | 3.05 | 3.31 


4) 


. Identify the claim and state Hy and H,. 

. Use a technology tool to enter the data. 

Perform the ANOVA test to find the P-value. 

. Decide whether to reject the null hypothesis. 

Interpret the decision in the context of the claim. Answer: Page A47 


cees 
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>» TWO-WAY ANOVA 


When you want to test the effect of two independent variables, or factors, on one 
dependent variable, you can use a two-way analysis of variance test. For instance, 
suppose a medical researcher wants to test the effect of gender and type of 
medication on the mean length of time it takes pain relievers to provide relief. 
To perform such an experiment, the researcher can use the following two-way 
ANOVA block design. 


INSIGHT 


The conditions for a 
two-way ANOVA test 
are the same as those 
for a one-way ANOVA 
test with the additional 
condition that all 


Gender 
samples must be of M F 
equal size. 
ac a Males taking Females taking 
Sg type I type I 
3 
8 Il Males taking Females taking 
5 type Il type II 
2. 
Ee I Males taking Females taking 
type III type III 
A two-way ANOVA test has three null hypotheses—one for each main effect 
INSIGHT and one for the interaction effect. A main effect is the effect of one independent 
If gender and type of medication variable on the dependent variable, and the interaction effect is the effect of both 
have no effect on the length of independent variables on the dependent variable. For instance, the hypotheses 
time it takes a pain reliever to for the pain reliever experiment are as follows. 


provide relief, then there will be 

no significant difference in the 

means of the relief times. Ho: Gender has no effect on the mean length of time it takes a pain reliever 
to provide relief. 


Hypotheses for main effects: 


H,: Gender has an effect on the mean length of time it takes a pain reliever 
to provide relief. 


Hy: The type of medication has no effect on the mean length of time it takes 
a pain reliever to provide relief. 


H,: The type of medication has an effect on the mean length of time it takes 
a pain reliever to provide relief. 


Hypotheses for interaction effect: 


Hy: There is no interaction effect between gender and type of medication on 
the mean length of time it takes a pain reliever to provide relief. 


H,: There is an interaction effect between gender and type of medication on 
the mean length of time it takes a pain reliever to provide relief. 


To test these hypotheses, you can perform a two-way ANOVA test. Using the 
F-distribution, a two-way ANOVA test calculates an F-test statistic for each 
hypothesis. As a result, it is possible to reject none, one, two, or all of the null 
hypotheses. The statistics involved with a two-way ANOVA test is beyond the 
scope of this course. You can, however, use a technology tool such as MINITAB 
to perform a two-way ANOVA test. 
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State the null and alternative hypotheses for a one-way ANOVA test. 


ANALYSIS OF VARIANCE 


581 


What conditions are necessary in order to use a one-way ANOVA test? 


M@ BUILDING BASIC SKILLS AND VOCABULARY 
1. 
yy 
3: 


Describe the difference between the variance between samples MS, and the 


variance within samples MSvw. 


4. Describe the hypotheses for a two-way ANOVA test. 


M@ USING AND INTERPRETING CONCEPTS 


Performing a One-Way ANOVA Test Jn Exercises 5-14, (a) identify the 
claim and state Hy and H,, (b) find the critical value and identify the rejection 
region, (c) find the test statistic F, (d) decide whether to reject or fail to reject 
the null hypothesis, and (e) interpret the decision in the context of the original 
claim. If convenient, use technology to solve the problem. In each exercise, assume 
that each population is normally distributed and that the population variances 


are equal. 


"5. Toothpaste The table shows 
the cost per ounce (in dollars) 
for a random sample of 
toothpastes exhibiting very 
good stain removal, good 
stain removal, and fair stain 
removal. At a = 0.05, can 
you conclude that the mean 
costs per ounce are different? 
(Source: Consumer Reports) 


0.47 
0.49 
0.33 
1.52 
0.64 
0.36 
0.41 
0.37 
0.48 
0.50 
0.51 
0.35 


0.60 
0.64 
1.05 
2.73 
0.58 
0.75 
0.22 
0.33 
0.42 
0.46 
0.98 
1.16 


0.34 
0.46 
1.31 
0.44 
0.60 


6. Automobile Batteries The prices (in dollars) of 17 randomly selected 
automobile batteries are shown in the table. The prices are classified 
according to battery type. At a = 0.05, is there enough evidence to conclude 
that at least one of the mean battery prices is different from the others? 


(Source: Consumer Reports) 


90 90 7 
100 75 | 105 
115) 75 75 
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105 
100 
90 


65 
100 
110 


75 
90 
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‘7. Government Salaries The table shows the salaries (in thousands of 


dollars) of randomly selected individuals from the federal, state, and local 
levels of government. At a = 0.01, can you conclude that at least one 
mean salary is different? (Adapted from Bureau of Labor Statistics) 


63.7 50.6 47.1 
56.4 34.7 36.6 
67.8 S17 40.9 
75.6 52.2 39.3 
74.9 54.4 49.9 
79.0 59.5 44.0 
49.6 37.6 58.6 
64.5 48.1 39.1 
74.2 D138 35.5 
57.9 45.1 31.7 


‘. 8. Late Night Hosts The table shows the ages (in years) of randomly 
selected viewers for several late night hosts. At a = 0.05, can you conclude 
that at least one mean age is different? (Adapted from The New York Times) 


19 22 18 41 2 
28 27 21 43 34 
37 32 33 44 41 
43 37 42 47 43 
48 45 48 49 48 
48 53 57 51 54 
53 55 59 59 57 
54 62 61 61 59 
54 64 62 62 60 
57 67 64 64 63 
62 68 68 65 69 
67 70 68 67 71 
79 72 75 TA 73 


‘". 9. Cost Per Mile The table shows the cost per mile (in cents) for a random 
sample of automobiles. At a = 0.01, can you conclude that at least one mean 


cost per mile is different? (Adapted from American Automobile Association) 


40 62 59 84 63 

38 44 68 63 73 

46 58 78 72 56 

51 54 70 75 48 

43 59 75 67 
47 67 
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Well-Being Index The well-being “Northeast Midwest South West 
index is a way to measure how 
people are faring physically, 66.9 67.3 67.0 70.2 
emotionally, socially, and 67.4 67.6 66.2 68.3 
professionally, as well as to rate 65.0 67.8 66.1 67.3 
the overall quality of their lives 65.4 672 668 || 683 
and their outlooks for the 
future. The table shows the soo ae ell ee 
well-being index scores for a ae ssid rn |p ee 
random sample of states from 63.6 65.1 63.8 
four regions of the United 63.9 64.2 | 65.3 
States. At a = 0.10, can you 63.9 66.0 
reject the claim that the mean 64.0 
score is the same for all 60.5 
regions? (Adapted from Gallup : 
and Healthways) 
Days Spent at the Hospital In a recent study, a health insurance 


company investigated the number of days patients spent at the hospital. 
In part of the study, the company randomly selected patients from 
various parts of the United States and recorded the number of days 
each patient spent at the hospital. The results of the study are shown in 
the table. At a = 0.01, can the company reject the claim that the mean 
number of days patients spend at the hospital is the same for all four 
regions? (Adapted from National Center for Health Statistics) 


CORDA WNHA BOD 
Nw RRM WwANIAD A 

BAY WADH nw 
YVananaBDH BW 


. Personal Income The table shows the salaries of randomly selected 


individuals from six large metropolitan areas. At a = 0.05, can you 
conclude that the mean salary is different in at least one of the areas? 
(Adapted from U.S. Bureau of Economic Analysis) 


41,950 34,315 | 43,500 41,400 46,000 56,135 

36,100 31,500 | 47,350 42,580 43,100 46,500 

45,200 38,000 34,700 46,600 41,550 44,400 

51,400 49,495 46,500 49,900 52,300 51,000 

50,920 38,700 = 39,050 = 53,175 39,400 55,875 

40,500 | 51,050 | 42,900 44,000 
45,060 | 47,700 
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" 13. Energy Consumption The table shows the energy consumed (in 
millions of Btu) in one year for a random sample of households from 
four regions of the United States. At a = 0.10, can you conclude that 
the mean energy consumption of at least one region is different from 
the others? (Adapted from U.S. Energy Information Administration) 


101.5 
109.5 
153.6 
129.0 
160.3 
114.6 

85.2 
173.0 

73.6 


* 14. Amount Spent on Energy The table shows the amount spent (in 
dollars) on energy in one year for a random sample of households from 
four regions of the United States. At a = 0.05, can you reject the claim 
that the mean amounts spent are equal for all regions? (Adapted from 


56.7 
174.8 
61.6 
79:3 
160.9 
179.9 
98.6 
132.1 
89.5 
155.5 
61.9 


U.S. Energy Information Administration) 


1456 
3025 
2029 
1735 
1956 
3078 
3023 
1709 
3425 
1684 


In Exercises 15 and 16, use StatCrunch to perform a one-way ANOVA test. 
Decide whether to reject the null hypothesis. Then, interpret the decision in the 


context of the original claim. 


15. Sports Team Involvement The table shows the number of female students 
who played on a sports team in grades 9 through 12 for a random sample 
of 8 high schools in a state. At a = 0.01, can you reject the claim that the 
mean numbers of female students who played on a sports team are equal for 


all grades? 


91 
87 
81 
77 


1940 
1570 
1972 
1924 
1820 
2144 
1319 
1655 
1730 


53 
58 
46 
42 
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133 
125 
115 
102 


68.8 
92.8 
56.5 
96.6 
57.0 
51.3 
63.2 
50.6 
182.0 


1457 
2202 
1883 
1310 
1876 
1578 
1980 


64 
51 
56 
49 


31.0 
46.2 
127.7 
61.4 
108.4 
69.8 
44.5 
124.8 
98.6 
58.3 


1168 
1927 

989 
2022 
1330 
1184 
1819 


112 
106 
87 
84 


63 
58 
62 
65 


77 
72 
80 
82 
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16. Housing Prices The table shows the sale prices (in thousands of dollars) 
of randomly selected one-family houses in three cities. At a = 0.10, can you 
conclude that at least one mean sale price is different? (Adapted from National 


Association of Realtors) 


179.0 253.9 229.9 
151.5 2111 114.9 
196.6 195.3 210.2 
192.3 197.5 202.7 
254.7 217.9 149.1 
212.4 244.8 166.0 

92.8 263.2 133.3 
210.6 154.7 213.4 
180.5 173.9 104.7 
226.0 183.3 215.4 
179.0 


M@ EXTENDING CONCEPTS 


Using Technology to Perform a Two-Way ANOVA Test Jn Exercises 
17-20, use a technology tool and the given block design to perform a two-way 
ANOVA test. Use a = 0.10. Interpret the results. 


| 17. Advertising A study was conducted in which a random sample of 20 

; adults was asked to rate the effectiveness of advertisements. Each adult 

rated a radio or television advertisement that lasted 30 or 60 seconds. 

The block design shows these ratings (on a scale of 1 to 5, with 5 being 
extremely effective). 


Advertising medium 


Radio Television 
3 
S 30 sec Devo 1a 5) 3h, 5), 44 Il, 2 
S 
Sp 
5 60 sec il, 44, 2, 2,5 2. 5), 3), al, Al 
J 


* 18. Vehicle Sales The owner of a car dealership wants to determine if the 
gender of a salesperson and the type of vehicle sold affect the number of 
vehicles sold in a month. The block design shows the number of vehicles, 

listed by type, sold in a month by a random sample of eight salespeople. 


Type of vehicle 
Car Truck Van/SUV 
5 Male ©, 5, 2), 5 Dede, Mp 8) 4, 3, 4,2 
E 
O Female by, daha 7 i, @, 1,2 4, 2,0, 1 
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* 19. Grade Point Average A study was conducted in which a random 
sample of 24 high school students was asked to give their grade point 
average (GPA). The block design shows the GPAs of male and female 
students from four different age groups. 


Age 
Under 15 15-16 17-18 Over 18 
5 Male| 2.5, 2.1, 3.8 4.0, 1.4, 2.0 35 D2), D0 3.11, O78, 28 
so} 
f=} 
oO 
O Female | 4.0, 2.1, 1.9 3.5, 3.0, 2.1 4.0, 2.2, 1.7 1.6, 2.5, 3.6 


* 20. Disk Drive Repairs The manager of a computer repair service wants 
to determine whether there is a difference in the time it takes four 
technicians to repair different brands of disk drives. The block design 
shows the times (in minutes) it took for each technician to repair three 
disk drives of each brand. 


Technician 
Technician 1 Technician2 Technician3 Technician 4 
Brand A | 67, 82, 64 42, 56, 39 69, 47, 38 70, 44, 50 
as} 
I BrandB | 44, 62,55 47, 58, 62 55, 45, 66 47, 29, 40 
O 
Brand C | 47, 36, 68 39, 74, 51 74, 80, 70 45, 62, 59 


The Scheffé Test Jf the null hypothesis is rejected in a one-way ANOVA test 
of three or more means, a Scheffé Test can be performed to find which means have 
a significant difference. In a Scheffé Test, the means are compared two at a time. 
For instance, with three means you would have the following comparisons: xX, 
versus X2, X; versus X3, and Xz versus X3. For each comparison, calculate 
(Xa = Xp)” 
SSw 

= av [C1 /n) + (1/n 

Sin / a) ( , b) | 
where X, and X, are the means being compared and n, and ny are the 
corresponding sample sizes. Calculate the critical value using the same steps as in 
a one-way ANOVA test and multiply the result by k — 1. Then compare the value 
that is calculated using the formula above with the critical value. The means have 
a significant difference if the critical value is less than the value calculated using the 
formula above. 


Use the information above to solve Exercises 21-24. 


21. Refer to the data in Exercise 7. At a = 0.01, perform a Scheffé Test to 
determine which means have a significant difference. 


22. Refer to the data in Exercise 9. At a = 0.01, perform a Scheffé Test to 
determine which means have a significant difference. 


23. Refer to the data in Exercise 10. At a = 0.10, perform a Scheffé Test to 
determine which means have a significant difference. 


24. Refer to the data in Exercise 14. At a = 0.05, perform a Scheffé Test to 
determine which means have a significant difference. 
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USES AND ABUSES =: 


sy . 


Uses 


One-Way Analysis of Variance (ANOVA) ANOVA can help you make 
important decisions about the allocation of resources. For instance, suppose 
you work for a large manufacturing company and part of your responsibility 
is to determine the distribution of the company’s sales throughout the world and 
decide where to focus the company’s efforts. Because wrong decisions will cost 
your company money, you want to make sure that you make the right decisions. 


Abuses 


Preconceived Notions There are several ways that the tests presented in 
this chapter can be abused. For instance, it is easy to allow preconceived 
notions to affect the results of a chi-square goodness-of-fit test and a test for 
independence. When testing to see whether a distribution has changed, do not let 
the existing distribution “cloud” the study results. Similarly, when determining 
whether two variables are independent, do not let your intuition “get in the 
way.” As with any hypothesis test, you must properly gather appropriate data 
and perform the corresponding test before you can reach a logical conclusion. 


Incorrect Interpretation of Rejection of Null Hypothesis It is important to 
remember that when you reject the null hypothesis of an ANOVA test, you are 
simply stating that you have enough evidence to determine that at least one of 
the population means is different from the others. You are not finding them all 
to be different. One way to further test which of the population means differs 
from the others is explained in Extending Concepts in Section 10.4 Exercises. 


M@ EXERCISES 


1. Preconceived Notions ANOVA depends on having independent variables. 
Describe an abuse that might occur by having dependent variables. Then 
describe how the abuse could be avoided. 


2. Incorrect Interpretation of Rejection of Null Hypothesis Find an 
example of the use of ANOVA. In that use, describe what would be meant 
by “rejection of the null hypothesis.” How should rejection of the null 
hypothesis be correctly interpreted? 
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i!) CHAPTER SUMMARY 
REVIEW 


What did you learn? EXAMPLE(S) | EXERCISES 


Section 10.1 


= How to use the chi-square distribution to test whether a frequency 1-3 1-4 


distribution fits a claimed distribution 
(O — E) 
2 — 

x > E 


Section 10.2 


= How to use a contingency table to find expected frequencies 1 5-8 


(Sum of row r) X (Sum of column c) 


Ere = Sample size 


= How to use a chi-square distribution to test whether two variables 2,3 5-8 
are independent 


Section 10.3 


= How to interpret the F-distribution and use an F-table to find critical d52 9-16 
values 


= How to perform a two-sample F-test to compare two variances 3,4 17-22 


Section 10.4 


= How to use one-way analysis of variance to test claims involving three 1,2 23, 24 
or more means 
_ MSz, 
~ MSw 
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DD) REVIEW EXERCISES 


$10 to $20 M@ SECTION 10.1 


16% In Exercises 1-4, (a) identify the claimed distribution and state Hy and H,, (b) find 
the critical value and identify the rejection region, (c) find the chi-square test statistic, 
(d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the 
decision in the context of the original claim. 


Less than $10 
29% 1 


1. Results from a previous survey asking parents how much they give for an 


Don’t give allowance are shown in the pie chart. To determine whether the distribution 
one/other More than $21 has changed, you randomly select 1103 parents and ask them how much they 
46% oe give for an allowance. The results are shown in the table. At a = 0.10, 


FIGURE FOR EXERCISE 1 


can you conclude that there has been a change in the claimed or expected 


distribution? (Adapted from Echo Research) 


Less than $10 353 
$10 to $20 167 
More than $21 94 
Don’t give one/other 489 


2. Results from a survey 10 years ago asking people how long their office visits 
with a physician were are shown in the pie chart. To determine whether 
this distribution has changed, a research organization randomly selects 
350 people and asks them how long their office visits with a physician were. 
The results are shown in the table. At a = 0.01, can you conclude that there 
has been a change in the claimed or expected distribution? (Adapted from 
National Center for Health Statistics) 


61 and over 1-5 
4% 


1-5 

6-10 62 
11-15 126 
16-30 129 
31-60 23 
61 and over 1 


most help with in golf are shown in the 
pie chart. To determine whether the 
distribution is the same, a golf instruc- 


Approach and swing 3. Results from a previous survey asking Survey results 


Driver 


shots Short-game shots 


tor randomly selects 435 golf students SEpiCot eee = 
and asks them what they need the Pe nenats = 
most help with in golf. The results are Putting 18 
Putting shown in the table. At a = 0.05, are 
Short-game shots 4% the distributions the same? (Adapted 


65% 
FIGURE FOR EXERCISE 3 


from PGA of America) 
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4. An organization believes that the Response Frequency, 
thoughts of adults ages 55 and over on | : u 


which industry has the most trustworthy Auto companies 128 
advertising is uniformly distributed. Fast food 192 

To test this claim, you randomly select companies 

800 adults ages 55 and over and ask Financial services 1D 
each which industry has the most companies 

trustworthy advertising. The results are Pharmaceutical 152 
shown in the table. At a = 0.05, can you companies 

reject the claim that the distribution is Soft drink 16 
uniform? (Adapted from Harris Interactive) —_|_COMpanies ee 


M@ SECTION 10.2 


In Exercises 5-8, use the given contingency table to (a) find the expected 
frequencies of each cell in the table, (b) perform a chi-square test for independence, 
and (c) comment on the relationship between the two variables. Assume the 
variables are independent. If convenient, use technology to solve the problem. 


5. The contingency table shows the results of a random sample of public elementary 
and secondary school teachers by gender and years of full-time teaching 
experience. Use a = 0.01. (Adapted from U.S. National Center for Education Statistics) 


Male 58 | 377 
Female 152 811 786 701 


6. The contingency table shows the 
results of a random sample of 
individuals by gender and type of 
vehicle owned. Use a = 0.05. | Male 85 6 45 


_ Female 110 — 75 60 3 


7. The contingency table shows the results of a random sample of endangered 
and threatened species by status and vertebrate group. Use a = 0.01. (Adapted 
from U.S. Fish and Wildlife Service) 


_ Endangered | 
| Threatened 


8. The contingency table shows the distribution of a random sample of fatal 
pedestrian motor vehicle collisions by time of day and gender in a recent year. 
Use a = 0.10. (Adapted from National Highway Traffic Safety Administration) 
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M@ SECTION 10.3 


In Exercises 9-12, find the critical F-value for a right-tailed test using the indicated 
level of significance a and degrees of freedom d.f.n and d.f .p. 


9. a = 0.05, df = 6,d.f.p = 50 10. a = 0.01, dfn = 12,df.p = 10 
11. a = 0.10, dfx = 5,d.£.p = 12 12. a = 0.05, d.f£.n = 20, df.p = 25 


In Exercises 13-16, find the critical F-value for a two-tailed test using the indicated 
level of significance a and degrees of freedom d.f.j and d.f .p-. 


13. a = 0.10, df = 15, d.f.p = 27 14. a = 0.05, d-f.n = 9,df.p = 8 
15. a = 0.01, dfx = 40,d.f£.p = 60 16. « = 0.01, dfn = 11,dfp = 13 


In Exercises 17 and 18, test the claim about the difference between two population 
variances a7 and «3 at the indicated level of significance a using the given sample 
statistics. Assume the sample statistics are from independent samples that are 
randomly selected and each population has a normal distribution. 


17. Claim: of <= 03; a = 0.01. Sample statistics: st = 653, n, = 16; s3 = 270, 


i = 21 
18. Claim: of # 03; a = 0.10. Sample statistics: s{ = 87.3, n, = 31; s3 = 45.5, 
ao 29 


In Exercises 19-22, test the claim about two population variances at the indicated 
level of significance a. Interpret the results in the context of the claim. If 
convenient, use technology to solve the problem. In each exercise, assume the 
samples are independent and each population has a normal distribution. 


19. An agricultural analyst is comparing the wheat production in Oklahoma 
counties. The analyst claims that the variation in wheat production is greater 
in Garfield County than in Kay County. A random sample of 21 Garfield 
County farms yields a standard deviation of 0.76 bushel per acre. A random 
sample of 16 Kay County farms is found to have a standard deviation of 
0.58 bushel per acre. Test the analyst’s claim at a = 0.10. (Adapted from 
Environmental Verification and Analysis Center—University of Oklahoma) 


20. A travel consultant indicates that the standard deviations of hotel room rates 
for San Francisco, CA and Sacramento, CA are the same. A random sample 
of 36 hotel room rates in San Francisco has a standard deviation of $75 and 
a random sample of 31 hotel room rates in Sacramento has a standard 
deviation of $44. At a = 0.01, can you reject the travel consultant’s claim? 
(Adapted from I-Map Data Systems LLC) 


" 21. The table shows the SAT verbal test scores for 9 randomly selected 
female students and 13 randomly selected male students. Assume that 
SAT verbal test scores are normally distributed. At a = 0.01, test the 
claim that the test score variance for females is different from that for 


males. 


610 340 630 520 690 540 
680 360 530 380 460 630 


800 
730 740 520 560 400 
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" 22. A plastics company that produces automobile dashboard inserts has just 
received a new injection mold that is supposedly more consistent than 
the company’s current mold. A quality technician wishes to test whether 
this new mold will produce inserts that are less variable in diameter 
than those produced with the company’s current mold. The table shows 
independent random samples (of size 12) of insert diameters (in 
centimeters) for both the current and new molds. At a = 0.05, test the 
claim that the new mold produces inserts that are less variable in 
diameter than the inserts the current mold produces. 


M@ SECTION 10.4 


In Exercises 23 and 24, use the given sample data to perform a one-way ANOVA 
test using the indicated level of significance a. What can you conclude? Assume 
that each sample is drawn from a normal, or approximately normal, population, 
that the samples are independent of each other, and that the populations have the 
same variances. If convenient, use technology to solve the problem. 


4) 23. 


"24. 


The table at the right shows 
the residential electricity cost 
(in dollars per million Btu) in 
one year for a random sample 
of households in four regions of 
the United States. Use a = 0.10 
to test for differences among 
the means for the four regions. 
(Adapted from U.S. Energy 
Information Administration) 


The table at the right shows 
the annual income (in dollars) 
for a random sample of families 
in four regions of the United 
States. Use a = 0.05 to test for 
differences among the means 
for the four regions. (Adapted 
from U.S. Census Bureau) 


9.611 | 9.618 9.594 | 9.580 
9.571 | 9.642 9.650 9.651 


9.638 | 9.568 | 9.605 | 9.603 
9.570 | 9.537 | 9.641 | 9.625 


40.24 
28.18 
35.67 
34.18 
39.03 
30.74 
32.65 
29.98 


9.611 
9.596 


9.647 
9.626 


18.40 
26.66 
28.27 
21.38 
24.64 
20.15 
29.77 
25.08 


9.597 
9.636 


9.590 
9.579 


22.85 
29.79 
18.93 
21.81 
25.47 
23.64 
19.82 
28.15 


35.03 
S151 
20.28 
28.82 
24.07 
27.60 
29.25 
18.57 


74,833 
66,098 
74,961 
51,089 
71,920 


Presented by: https://jafrilibrary.org 


57,116 
80,729 
78,788 
59,543 
57,093 
72,305 
39,019 


54,248 
77,990 
42,293 
51,998 
40,161 
64,698 


71,690 
64,098 
60,307 
58,385 
61,862 
61,609 
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PED cuapter Quiz 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


For each exercise, 

(a) state Ho and H,. 

(b) specify the level of significance a. 
(c) determine the critical value. 

(d) determine the rejection region. 
(e) find the test statistic. 

(f ) make a decision. 


(g) interpret the results in the context of the problem. 
If convenient, use technology to solve the problem. 


" For Exercises I and 2, use the following data. The data list the annual wages 
(in thousands of dollars) for randomly selected individuals from three 
metropolitan areas. Assume the wages are normally distributed and that the 
samples are independent. (Adapted from U.S. Bureau of Economic Analysis) 


San Francisco, CA: 64.5, 75.9, 47.5, 52.3, 45.9, 59.7, 71.2, 74.1, 65.4, 61.9, 60.9, 
58.7, 54.6 


Baltimore, MD: 45.9, 39.8, 46.2, 44.9, 37.5, 52.9, 57.5, 49.7, 48.1, 47.9, 55.9, 
35.5, 39.9, 40.9, 45.4 
Jacksonville, FL: 31.3, 29.3, 39.2, 45.7, 34.9, 31.5, 41.5, 49.8, 52.6, 40.1, 42.9, 
33.4, 30.5, 50.2, 34.7 
1. At a = 0.01, is there enough evidence to conclude that the variances in 
annual wages for San Francisco, CA and Baltimore, MD are different? 


2. Are the mean annual wages equal for all three cities? Use a = 0.10. Assume 
that the population variances are equal. 


For Exercises 3 and 4, use the following table. The table lists the distribution 
of educational achievement for people in the United States ages 25 and older. It 
also lists the results of a random survey for two additional age categories. (Adapted 


from U.S. Census Bureau) 


13.4% 36 86 
31.2% 92 161 
17.2% 35 hd 

8.8% 32 27 
19.1% 70 60 
10.3% 36 45 


3. Does the distribution for people in the United States ages 25 and older 
differ from the distribution for people in the United States ages 35-44? 
Use a = 0.05. 


4. Does the distribution for people in the United States ages 25 and older 
differ from the distribution for people in the United States ages 65-74? 
Use a = 0.01. 
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» « @ PUTTING IT ALL TOGETHER 


The National Fraud Information Center (NFIC) was established in 
1992 by the National Consumers League (NCL) to combat the growing 
problem of telemarketing fraud by improving prevention and 
enforcement. NCL works to protect and promote social and economic 
justice for consumers and workers in the United States and abroad. 

You work for the NCL’s Fraud Center as a statistical analyst. You 
are studying data on telemarketing fraud. Part of your analysis involves 
testing the goodness of fit, testing for independence, comparing 
variances, and performing ANOVA. 


1. Goodness of Fit 


A claimed distribution for the ages of telemarketing fraud victims 
is shown in the table at the right. The results of a survey of 1000 
randomly selected telemarketing fraud victims are also shown in the 
table. Using a = 0.01, perform a chi-square goodness-of-fit test to 
test the claimed distribution. What can you conclude? Do you think 
the claimed distribution is valid? Why or why not? 


2. Independence 


The following contingency table shows the results of a random 
sample of 2000 telemarketing fraud victims classified by age and 
type of fraud. The frauds were committed using bogus sweepstakes 
or credit card offers. 


(a) Calculate the expected frequency for each cell in the 
contingency table. Assume the variables age and type of fraud 
are independent. 


(b) Can you conclude that the ages of the victims are related to the 
type of fraud? Use a = 0.01. 


Age 
Type of 
fraud Under 20 20-29 30-39 40-49 50-59 60-69 
Sweepstakes 10 60 70 130 90 160 
Credit cards 20 180 260 240 180 70 
Total 30 240 330 370 270 230 


Pgs Real Statistics — Real Decisions 


PRATIONAL CONSUMERS LEAGUE | eee earl 


Claimed Survey 

Ages distribution _ results 
Under 20 1% 30 
20-29 14% 200 
30-39 17% 300 
40-49 18% 270 
50-59 18% 150 
60-69 12% 40 
70+ 20% 10 


TABLE FOR EXERCISE 1 


70-79 80+ = Total 
280 200 1000 
30 20 1000 
310 220 2000 
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TECHNOLOGY MINITAB TI-83/84 PLUS 
© Teacher salaries 
California Ohio Texas 


TEACHER SALARIES 


In 1916, the American Federation of Teachers (AFT) 
was formed by three teacher groups in Chicago, Illinois 
and locals from Gary, Indiana; New York, New York; 


66,645 49,400 45,300 


56,622 50,500 | 47,800 
47,400 | 46,750 | 43,400 


Scranton, Pennsylvania; and Washington, D.C. Today, the 65,000 62,125 39,605 
AFT represents over 1.4 million teachers, higher education 52,150 68,900 52,425 
faculty and staff, school support staff, state and municipal 69,200 | 45,300 | 57,200 
employees, and health care professionals. 74,400 | 49,080 | 45,000 


Each year, the AFT publishes the Survey and Analysis of 
Teacher Salary Trends. This report focuses on national trends 
in teacher salaries, state comparisons, beginning teacher 


59,400 54,525 
68,378 51,400 


50,150 
39,500 


salaries, and salary data and living costs for the nation’s 64,873 49,800 41,680 
50 largest cities. 59,395 | 48,950 | 43,075 

The table at the right shows the salaries of a random 62,000 69,300 46,450 
sample of teachers from California, Ohio, and Texas. 69.200 | 41,860 51,970 


73,400 = 56,526 | 41,795 
71,405 57,900 | 40,100 
58,200 54,300 | 40,822 


M@ EXERCISES 


In Exercises 1-3, refer to the following samples. 5. Repeat Exercises 1—4 using the data in the table 
Use a = 0.05. below. The table displays the salaries of random 

samples of teachers from Alaska, Nevada, and 
(a) California teachers New York. 


(b) Ohio teachers 


(c) Texas teachers 


1. Are the samples independent of each other? 


Explain. 64,350 | 72,320 78,106 
55,425 55,750 87,500 


2. Use a technology tool to determine whether 


each sample is from a normal, or approximately 72,100 | 36,800 89,100 
normal, population. 54,900 45,556 42,300 
3. Use a technology tool to determine whether the 37,160 | 66,800 47,350 
samples were selected from populations having 48,868 44,874 39,781 
equal variances. 62,585 | 44,805 37,660 
4. Using the results of Exercises 1-3, discuss 56,185 41,452 54,100 
whether the three conditions for a one-way 54,232 49,678 93,100 
ANOVA test are satisfied. If so, use a technology 50,252 54,459 38,150 


tool to test the claim that teachers from 
California, Ohio, and Texas have the same mean 
salary. Use a = 0.05. 


49,269 63,084 90,100 
40,160 45,465 40,108 
42,585 35,699 45,714 
53,495 47,505 83,850 


50,262 45,002 38,099 
Extended solutions are given in the Technology Supplement. 82.870 | 41.387 48.102 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. d , L : 
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NONPARAMETRIC 
TESTS 


11.1 The Sign Test 

11.2 The Wilcoxon Tests 
@ CASE STUDY 

11.3 The Kruskal-Wallis 
Test 

11.4 Rank Correlation 

11.5 The Runs Test 
m@ USES AND ABUSES 


@ REAL STATISTICS— 
REAL DECISIONS 


m@ TECHNOLOGY 


In a recent year, the most common form 
of reported identity theft was credit card 
fraud (20%), followed by government 
documents/benefits fraud (15%), 
employment fraud (15%), and 

phone or utilities fraud (13%). 
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«€ WHERE YOU'VE BEEN 


Up to this point in the text, you have studied 
dozens of different statistical formulas and tests 
that can help you in a decision-making process. 


Suppose it is believed that as the number of 
fraud complaints in a state increases, the number 
of identity theft victims also increases. Can this 


Specific conditions had to be satisfied in order to belief be supported by actual data? The data 


below show the number of fraud complaints 


use these formulas and tests. 


and the number of identify theft victims for 
25 randomly selected states in a recent year. 


22,805 1535 | 10,556 | 8099 | 106,623 2630 8978 
8363 296 3292 2696 | 51,140 759 3819 


5895 57,472 | 19,585 15,159 | 2253 12,584 +4807 7101 
1347 | 24,440 5412 4589 490 2937 2081 2005 


In this chapter, you will study additional statistical 
tests that do not require the population distribution 
to meet any specific conditions. Each of these tests 
has usefulness in real-life applications. 


With the data above, the number of fraud 
complaints F and the number of identity theft 
victims V can be related by the regression 
equation V = 0.472F — 1802.101. The correlation 
coefficient is approximately 0.987, so there is a 
strong positive correlation. You can determine 
that the correlation is significant by using 
Table 11 in Appendix B, but the V-values do not 
pass the normality requirement. 


So, although a simple correlation test might 
indicate a relationship between the number of 
fraud complaints and the number of identity 
theft victims, one might question the results 
because the data do not fit the requirements 


2396 6349 1775 5408 


WHERE YOU’RE GOING pp 


8173 24,695 7345 | 15,515 | 21,730 20,610 13,259 | 4498 | 30,578 


5855-9683 3528 = 2367 _~— 13,726 


for the test. Similar tests you will study in this 
chapter, such as Spearman’s rank correlation 
test, will give you additional information. The 
Spearman’s rank correlation coefficient for this 
data is approximately 0.971. At a = 0.01, there is 
in fact a significant correlation between the 
number of fraud complaints and the number of 
identity theft victims for each state. 
Number of Fraud Complaints 


and Identity Theft Victims 
for 25 States 


Identity theft victims 
s 
Ss 
Co 
| 
T 


10,000-- og, be 
i 


a a oe ee 
25,000 50,000 75,000 100,000 125,000 
Fraud complaints 
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WHAT YOU SHOULD LEARN 


» How to use the sign test to 
test a population median 


>» How to use the paired-sample 
sign test to test the difference 
between two population 
medians (dependent samples) 


INSIGHT 


For many nonparametric 
tests, statisticians test 
the median instead 

of the mean. 


The Sign Test 


The Sign Test for a Population Median > The Paired-Sample Sign Test 
> THE SIGN TEST FOR A POPULATION MEDIAN 


Many of the hypothesis tests studied so far have imposed one or more 
requirements for a population distribution. For instance, some tests require that 
a population must have a normal distribution, and other tests require that 
population variances be equal. What if, for a given test, such requirements cannot 
be met? For these cases, statisticians have developed hypothesis tests that are 
“distribution free.” Such tests are called nonparametric tests. 


DEFINITION 


A nonparametric test is a hypothesis test that does not require any specific 
conditions concerning the shapes of population distributions or the values of 
population parameters. 


Nonparametric tests are usually easier to perform than corresponding 
parametric tests. However, they are usually less efficient than parametric tests. 
Stronger evidence is required to reject a null hypothesis using the results of 
a nonparametric test. Consequently, whenever possible, you should use 
a parametric test. One of the easiest nonparametric tests to perform is the 
sign test. 


DEFINITION 


The sign test is a nonparametric test that can be used to test a population 
median against a hypothesized value k. 


The sign test for a population median can be left-tailed, right-tailed, or 
two-tailed. The null and alternative hypotheses for each type of test are as 
follows. 


Left-tailed test: 
Hy: median = k and H,: median < k 


Right-tailed test: 
Ay: median =< k and H,: median > k 


Two-tailed test: 
Hy: median = k and H,: median # k 


To use the sign test, first compare each entry in the sample with the 
hypothesized median k. If the entry is below the median, assign it a — sign; if the 
entry is above the median, assign it a + sign; and if the entry is equal to the 
median, assign it a 0. Then compare the number of + and — signs. (The 0’s are 
ignored.) If there is a large difference between the number of + signs and the 
number of — signs, it is likely that the median is different from the hypothesized 
value and the null hypothesis should be rejected. 
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INSIGHT 


Because the 0’s are ignored, there 
are two possible outcomes when 
comparing a data entry with a 
hypothesized median: a + or a — 
sign. If the median is k, then about 
half of the values will be above k 
and half will be below. As such, 
the probability for each sign is 
0.5. Table 8 in Appendix B is 
constructed using the binomial 
distribution where p = 0.5. 


When n > 25, you can use 
the normal approximation 
(with a continuity - 
correction) for the 
binomial. In this case, 
use w = np = 0.5n 
Vn 


and 0 = Vnpq = a 
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Table 8 in Appendix B lists the critical values for the sign test for selected 
levels of significance and sample sizes. When the sign test is used, the sample size 
nis the total number of + and — signs. If the sample size is greater than 25, you 
can use the standard normal distribution to find the critical values. 


TEST STATISTIC FOR THE SIGN TEST 


When n = 25, the test statistic for the sign test is x, the smaller number of 
+ or — signs. 


When n > 25, the test statistic for the sign test is 
(2 ae OS) = O50 
Van 


2 


Z= 


where x is the smaller number of + or — signs and 7 is the sample size, i.e., the 
total number of + and — signs. 


Because x is defined to be the smaller number of + or — signs, the rejection 
region is always in the left tail. Consequently, the sign test for a population 
median is always a left-tailed test or a two-tailed test. When the test is two-tailed, 
use only the left-tailed critical value. (If x is defined to be the larger number of 
+ or — signs, the rejection region is always in the right tail. Right-tailed sign tests 
are presented in the exercises.) 


GUIDELINES 


Performing a Sign Test for a Population Median 


IN WORDS 


. Identify the claim. State the null 


and alternative hypotheses. 


. Specify the level of significance. 
. Determine the sample size n by 


assigning + signs and — signs to 
the sample data. 


. Determine the critical value. 


. Find the test statistic. 


. Make a decision to reject or fail 


to reject the null hypothesis. 


. Interpret the decision in the 


context of the original claim. 


Presented by: https://jafrilibrary.org 


IN SYMBOLS 
State Hy and H,. 


Identify a. 


n = total number of 
+ and — signs 


If n S 25, use Table 8 in 
Appendix B. 


If n > 25, use Table 4 in 
Appendix B. 
If n S 25, use x. 
Itt a = 25), wee 
(ae ar OS) = Os 


Va 


2 


If the test statistic is less than 
or equal to the critical value, 
reject Hy. Otherwise, fail to 
reject Hp. 
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NONPARAMETRIC TESTS 


EXAMPLE 1 G report 50 


» Using the Sign Test 


A website administrator for a company claims that the median number of 
visitors per day to the company’s website is no more than 1500. An employee 
doubts the accuracy of this claim. The number of visitors per day for 
20 randomly selected days are listed below. At a = 0.05, can the employee 
reject the administrator’s claim? 


1469 1462 1634 1602 1500 
1463 1476 1570 1544 1452 
1487 1523) 1525 1548 1511 
1579 1620 1568 1492 1649 


> Solution 


The claim is “the median number of visitors per day to the company’s website 
is no more than 1500.” So, the null and alternative hypotheses are 


Ho: median = 1500 (Claim) and H,: median > 1500. 


The results of comparing each data entry with the hypothesized median 1500 
are shown. 


- - + + 0 
- + + + + 
+ + + - + 


You can see that there are 7 — signs and 12 + signs. So, n = 12 + 7 = 19. 
Because n = 25, use Table 8 to find the critical value. The test is a one-tailed 
test with a = 0.05 and n = 19. So, the critical value is 5. Because n = 25, the 
test statistic x is the smaller number of + or — signs. So, x = 7. Because 
x = 7 is greater than the critical value, the employee should fail to reject the 
null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
for the employee to reject the website administrator’s claim that the median 
number of visitors per day to the company’s website is no more than 1500. 


> Try It Yourself 1 


A real estate agency claims that the median number of days a home is on 
the market in its city is greater than 120. A homeowner wants to verify the 
accuracy of this claim. The number of days on the market for 24 randomly 
selected homes is shown below. At a = 0.025, can the homeowner support the 
agency’s claim? 


118 167 72 79 76 106 102 113 
73° 119 162 114 120 93 135 147 
77 157 115 88 152 70 65 91 


. Identify the claim and state Hy and H,. 

. Specify the level of significance a. 

. Determine the sample size n. 

. Determine the critical value. 

. Find the test statistic x. 

. Decide whether to reject the null hypothesis. 


. Interpret the decision in the context of the original claim. 
Answer: Page A47 


memonann ae & 
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EXAMPLE 2 

In 2008, people in the United > Using the Sign Test 
States spent a total of about An organization claims that the median annual attendance for museums 
$15.5 billion on candy. The in the United States is at least 39,000. A random sample of 125 museums 
U.S. Department of Commerce reveals that the annual attendances for 79 museums were less than 39,000, the 
reported that in 2008, the annual attendances for 42 museums were more than 39,000, and the annual 
average person in the United attendances for 4 museums were 39,000. At a = 0.01, is there enough evidence 


States ate about 22.4 pounds 


pees to reject the organization’s claim? (Adapted from American Association of Museums) 
of candy. 


> Solution 


The claim is “the median annual attendance for museums in the United States 
is at least 39,000.” So, the null and alternative hypotheses are 


AH: median = 39,000 (Claim) and H,: median < 39,000. 


Candy Consumption 


Because n > 25, use Table 4, the Standard Normal Table, to find the critical 
value. Because the test is a left-tailed test with a = 0.01, the critical value is 
Zo = —2.33. Of the 125 museums, there are 79 — signs and 42 + signs. When 
the Os are ignored, the sample size is 


n=79+42=121, and x= 42. 


Consumption (in pounds per person) 


With these values, the test statistic is 
(42 + 0.5) — 0.5(121) 


If you were to test the U.S. V121/2 
Department of Commerce’s 18 
claim concerning per capita = sz 
: 5.5 
candy consumption, would 
you use a parametric test or R32). 
a nonparametric test? What 
factors must you consider? 


The graph at the right shows the location 
of the rejection region and the test statistic 
z. Because z is less than the critical value, 
it is in the rejection region. So, you should 
reject the null hypothesis. 


Interpretation There is enough evidence 
at the 1% level of significance to reject 
the organization’s claim that the median 
annual attendance for museums in the 
United States is at least 39,000. 


> Try It Yourself 2 


An organization claims that the median age of automobiles in operation in the 
United States is 9.4 years. A random sample of 95 automobiles reveals that 
41 automobiles were less than 9.4 years old and 51 automobiles were more 
than 9.4 years old. At a = 0.10, can you reject the organization’s claim? 
(Source: Bureau of Transportation Statistics) 


STUDY TIP 


When performing a 
two-tailed sign test, 
remember to use only 
the left-tailed 

critical value. 


. Identify the claim and state Hp and H,. 

. Specify the level of significance a. 
Determine the sample size n. 

. Determine the critical value. 

. Find the fest statistic z. 

Decide whether to reject the null hypothesis. 


. Interpret the decision in the context of the original claim. 
Answer: Page A47 


conrenan ae 
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> THE PAIRED-SAMPLE SIGN TEST 


In Section 8.3, you learned how to use a f-test for the difference between means 
of dependent samples. That test required both populations to be normally 
distributed. If the parametric condition of normality cannot be satisfied, you can 
use the paired-sample sign test to test the difference between two population 
medians. To perform the paired-sample sign test for the difference between two 
population medians, the following conditions must be met. 


1. A sample must be randomly selected from each population. 


2. The samples must be dependent (paired). 


The paired-sample sign test can be left-tailed, right-tailed, or two-tailed. 
This test is similar to the sign test for a single population median. However, 
instead of comparing each data entry with a hypothesized median and recording 
a +, —, or 0, you find the difference between corresponding data entries and 
record the sign of the difference. Generally, to find the difference, subtract the 
entry representing the second variable from the entry representing the first 
variable. Then compare the number of + and — signs. (The 0’s are ignored.) If 
the number of + signs is approximately equal to the number of — signs, the null 
hypothesis should not be rejected. If, however, there is a significant difference 
between the number of + signs and the number of — signs, the null hypothesis 
should be rejected. 


GUIDELINES 
Performing a Paired-Sample Sign Test 


IN WORDS 


. Identify the claim. State the null 
and alternative hypotheses. 


. Specify the level of significance. 


. Determine the sample size n by 
finding the difference for each 
data pair. Assign a + sign for a 
positive difference, a — sign for 
a negative difference, and a 0 
for no difference. 


. Determine the critical value. 


. Find the test statistic. 


. Make a decision to reject or fail 
to reject the null hypothesis. 


. Interpret the decision in the 
context of the original claim. 
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IN SYMBOLS 
State Hp and H,. 


Identify a. 


n = total number of 
+ and — signs 


Use Table 8 in 
Appendix B. 


x = smaller number of 
+ or — signs 


If the test statistic is 
less than or equal to 
the critical value, 
reject Hp. Otherwise, 
fail to reject Ho. 


omrNI DN FP WN 
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EXAMPLE 3 


» Using the Paired-Sample Sign Test 


A psychologist claims that the number of repeat offenders will decrease if 
first-time offenders complete a particular rehabilitation course. You randomly 
select 10 prisons and record the number of repeat offenders during a two-year 
period. Then, after first-time offenders complete the course, you record the 
number of repeat offenders at each prison for another two-year period. The 
results are shown in the following table. At a = 0.025, can you support the 
psychologist’s claim? 


> Solution 
To support the psychologist’s claim, you could use the following null and 
alternative hypotheses. 

Hy: The number of repeat offenders will not decrease. 

H,: The number of repeat offenders will decrease. (Claim) 


The table below shows the sign of the differences between the “before” and 
“after” data. 


You can see that there is 1 — sign and there are 9 + signs.So,n = 1 + 9 = 10. 
In Table 8 with a = 0.025 (one-tailed) and n = 10, the critical value is 1. The 
test statistic x is the smaller number of + or — signs. So, x = 1. Because x is 
equal to the critical value, you should reject the null hypothesis. 


Interpretation There is enough evidence at the 2.5% level of significance to 
support the psychologist’s claim that the number of repeat offenders will 
decrease. 


> Try It Yourself 3 


A medical researcher claims that a new vaccine will decrease the number of 
colds in adults. You randomly select 14 adults and record the number of colds 
each has in a one-year period. After giving the vaccine to each adult, you again 
record the number of colds each has in a one-year period. The results are shown 
in the table at the left. At a = 0.05, can you support the researcher’s claim? 


. Identify the claim and state H, and H,. 

. Specify the level of significance a. 
Determine the sample size n. 

. Determine the critical value. 

. Find the fest statistic x. 

Decide whether to reject the null hypothesis. 


. Interpret the decision in the context of the original claim. 
Answer: Page A47 
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NONPARAMETRIC TESTS 


M@ BUILDING BASIC SKILLS AND VOCABULARY 
1. 


What is a nonparametric test? How does a nonparametric test differ from a 
parametric test? What are the advantages and disadvantages of using a 
nonparametric test? 


. When the sign test is used, what population parameter is being tested? 


. Describe the test statistic for the sign test when the sample size n is less than 


or equal to 25 and when n is greater than 25. 


. In your own words, explain why the hypothesis test discussed in this section is 


called the sign test. 


. Explain how to use the sign test to test a population median. 


. List the two conditions that must be met in order to use the paired-sample 


sign test. 


M@ USING AND INTERPRETING CONCEPTS 


Performing a Sign Test Jn Exercises 7-22, (a) identify the claim and state Hy 
and H.,,, (b) determine the critical value, (c) find the test statistic, (d) decide whether 
to reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


7. 


Credit Card Charges In order to estimate the median amount of new 
credit card charges for the previous month, a financial service accountant 
randomly selects 12 credit card accounts and records the amount of new 
charges for each account for the previous month. The amounts (in dollars) are 
listed below. At a = 0.01, can the accountant conclude that the median 
amount of new credit card charges for the previous month was more than 
$300? (Adapted from Board of Governors of the Federal Reserve System) 


346.71 382.59 255.03 202.17 309.80 265.88 
299.41 270.38 296.54 318.46 245.92 309.47 


. Temperature A meteorologist estimates that the median daily high 


temperature for the month of July in Pittsburgh is 83° Fahrenheit. The high 
temperatures (in degrees Fahrenheit) for 15 randomly selected July days in 
Pittsburgh are listed below. At a = 0.01, is there enough evidence to reject the 
meteorologist’s claim? (Adapted from U.S. National Oceanic and Atmospheric 
Administration) 


74 79 81 86 90 79 81 83 81 74 78 76 84 82 85 


. Sales Prices of Homes A real estate agent believes that the median sales 


price of new privately owned one-family homes sold in the past year is 
$198,000 or less. The sales prices (in dollars) of 10 randomly selected homes 
are listed below. At a = 0.05, is there enough evidence to reject the agent’s 
claim? (Adapted from National Association of Realtors) 


205,800 234,500 210,900 195,700 145,200 
198,900 254,000 175,900 189,500 212,500 
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11. 


13. 


14. 


15. 


16. 


17. 
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Temperature During a weather report, a meteorologist states that the 
median daily high temperature for the month of January in San Diego is 
66° Fahrenheit. The high temperatures (in degrees Fahrenheit) for 16 
randomly selected January days in San Diego are listed below. At a = 0.01, 
can you reject the meteorologist’s claim? (Adapted from U.S. National Oceanic 
and Atmospheric Administration) 


78 74 72 72 70 70 72 78 74 71 72 74 77 +79 75 73 


Credit Card Debt A financial services institution reports that the median 
amount of credit card debt for families holding such debts is at least $3000. 
In a random sample of 104 families holding debt, the debts of 60 families 
were less than $3000 and the debts of 44 families were greater than $3000. At 
a = 0.02, can you reject the institution’s claim? (Adapted from Board of 
Governors of the Federal Reserve System) 


. Financial Debt A financial services accountant estimates that the median 


amount of financial debt for families holding such debts is less than $65,000. 
In a random sample of 70 families holding debts, the debts of 24 families 
were less than $65,000 and the debts of 46 families were greater than $65,000. 
At a = 0.025, can you support the accountant’s estimate? (Adapted from 
Board of Governors of the Federal Reserve System) 


Twitter® Users A research group claims that the median age of Twitter® 
users is greater than 30 years old. In a random sample of 24 Twitter® users, 
11 are less than 30 years old, 10 are more than 30 years old, and 3 are 30 years 
old. At a = 0.01, can you support the research group’s claim? (Adapted from 
Pew Research Center) 


Facebook® Users A research group claims that the median age of Facebook® 
users is less than 32 years old. In a random sample of 20 Facebook® users, 
5 are less than 32 years old, 13 are more than 32 years old, and 2 are 32 years 
old. At a = 0.05, can you support the research group’s claim? (Adapted from 
Pew Research Center) 


Unit Size A renters’ organization claims that the median number of rooms 
in renter-occupied units is four. You randomly select 120 renter-occupied 
units and obtain the results shown below. At a = 0.05, can you reject the 
organization’s claim? (Adapted from U.S. Census Bureau) 


Fewer than 4 rooms Less than 1350 

4 rooms 40 1350 3 

More than 4 rooms 49 More than 1350 12 
TABLE FOR EXERCISE 15 TABLE FOR EXERCISE 16 


Square Footage A renters’ organization believes that the median square 
footage of renter-occupied units is 1350 square feet. To test this claim, you 
randomly select 22 renter-occupied units and obtain the results shown above. 
At a = 0.10, can you reject the organization’s claim? (Adapted from U.S. 
Census Bureau) 


Hourly Wages __A labor organization estimates that the median hourly wage 
of computer systems analysts is $37.06. In a random sample of 45 computer 
systems analysts, 18 are paid less than $37.06 per hour, 25 are paid more than 
$37.06 per hour, and 2 are paid $37.06 per hour. At a = 0.01, can you reject 
the labor organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


Presented by: https://jafrilibrary.org 


606 


CHAPTER 11 


Presented by: https://jafrilibrary.org 


NONPARAMETRIC TESTS 


18. Hourly Wages _ A labor organization estimates that the median hourly wage 
of podiatrists is at least $55.89. In a random sample of 23 podiatrists, 17 are 
paid less than $55.89 per hour, 5 are paid more than $55.89 per hour, and 1 is 
paid $55.89 per hour. At a = 0.05, can you reject the labor organization’s 
claim? (Adapted from U.S. Bureau of Labor Statistics) 


. Lower Back Pain The table shows the lower back pain intensity scores for 


eight patients before and after receiving acupuncture for eight weeks. At 
a = 0.05, is there enough evidence to conclude that the lower back pain 
intensity scores decreased after the acupuncture? (Adapted from Archives of 
Internal Medicine) 


i | | & Vas) ee We | 7 | 8 
59.2 | 46.3 | 65.4 | 74.0 | 79.3 | 81.6 | 44.4 | 591 
12.4 | 22.5 | 18.6 | 59.3 | 701 | 702 | 13.2 | 259 


‘ 20. Lower Back Pain The table shows the lower back pain intensity scores 


A *. 


> 21. 


for 12 patients before and after taking anti-inflammatory drugs for 8 
weeks. At a = 0.05, is there enough evidence to conclude that the lower 
back pain intensity scores decreased after taking anti-inflammatory 
drugs? (Adapted from Archives of Internal Medicine) 


(| oi) a |e | el s% 
71.0 | 42.1 | 791 | 575 | 64.0 60.4 
60.1 | 23.4) 86.2 621 | 442 | 497 


fi 8 9 10 11 12 
68.3 95.2 48.1 78.6 | 65.4 59.9 
58.3 72.6 51.8 82.5 | 63.2 47.9 


Improving SAT Scores A tutoring agency believes that by completing 
a special course, students can improve their critical reading SAT scores. 
In part of a study, 12 students take the critical reading part of the SAT, 
complete the special course, then take the critical reading part of the 
SAT again. The students’ scores are shown below. At a = 0.05, is there 
enough evidence to conclude that the students’ critical reading SAT 
scores improved? 
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Feeling 


How do.you feel relative to your real age? 


Adapted from USA TODAY Snapshot, September 1, 2009. 


FIGURE FOR EXERCISE 23 


Daily 
8 


FIGURE FOR EXERCISE 24 
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© 22. SAT Scores Students at a certain school are required to take the SAT 

: twice. The table shows both critical reading SAT scores for 12 students. 

At a = 0.01, can you conclude that the students’ critical reading scores 
improved the second time they took the SAT? 


477 | 325. 513. 636-571 
532 299 | S501 | 648 603 


23. Feeling Your Age A research organization conducts a survey by randomly 
selecting adults and asking them how they feel relative to their real age. The 
results are shown in the figure. (Adapted from Pew Research Center) 


(a) Use a sign test to test the null hypothesis that the proportion of adults 
who feel older than their real age is equal to the proportion of adults 
who feel younger than their real age. Assign a + sign to adults who feel 
older than their real age, assign a — sign to adults who feel younger than 
their real age, and assign a 0 to adults who feel their age. Use a = 0.05. 


(b) What can you conclude? 


24. Contacting Parents A research organization conducts a survey by randomly 
selecting adults and asking them how frequently they contact their parents 
by phone. The results are shown in the figure. (Adapted from Pew Research 
Center) 


(a) Use a sign test to test the null hypothesis that the proportion of adults 
who contact their parents by phone weekly is equal to the proportion 
of adults who contact their parents by phone daily. Assign a + sign to 
adults who contact their parents by phone weekly, assign a — sign to 
adults who contact their parents by phone daily, and assign a 0 to adults 
who answer “other.” Use a = 0.05. 


(b) What can you conclude? 


In Exercises 25 and 26, use StatCrunch to help you test the claim about the 
population median. 


25. Hourly Wages A labor organization claims that the median hourly wage 
of tool and die makers is $22.55. The hourly wages (in dollars) of 14 randomly 
selected tool and die makers are listed below. At a = 0.05, is there enough 
evidence to reject the labor organization’s claim? (Adapted from U.S. Bureau 
of Labor Statistics) 


21.75 23.10 20.50 25.80 29.25 26.35 27.40 
22.90 23.50 22.55 32.70 30.05 29.80 34.85 


26. Viewing Audience A television network claims that the median age of 
viewers for the Masters Golf Tournament is greater than 57 years. The ages 
of 24 randomly selected viewers are listed below. At a = 0.01, is there 
enough evidence to support the network’s claim? (Adapted from ESPN) 


60 85 70 59 42 21 57 25 65 71 33 40 
54. 50 57 49 50 30 27 57 17 90 35 46 
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M@ EXTENDING CONCEPTS 


More on Sign Tests When you are using a sign test for n > 25 and the test 
is left-tailed, you know you can reject the null hypothesis if the test statistic 


(x + 0.5) — 0.5n 


Nn 


2 


is less than or equal to the left-tailed critical value, where x is the smaller number 
of + or — signs. For a right-tailed test, you can reject the null hypothesis if the 
test statistic 
(x — 0.5) — 0.5n 
Vn 


2 


is greater than or equal to the right-tailed critical value, where x is the larger 
number of + or — signs. 


In Exercises 27-30, (a) write the claim mathematically and identify Hg and 
H,, (b) determine the critical value, (c) find the test statistic, (d) decide whether to 
reject or fail to reject the null hypothesis, and (e) interpret the decision in the 
context of the original claim. 


27. Weekly Earnings A labor organization claims that the median weekly 
earnings of female workers is less than or equal to $638. To test this claim, 
you randomly select 50 female workers and ask each to provide her weekly 
earnings. The results are shown in the table. At a = 0.01, can you reject the 
organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


Less than $638 Less than $798 

$638 3 $798 2 

More than $638 29 More than $798 45 
TABLE FOR EXERCISE 27 TABLE FOR EXERCISE 28 


28. Weekly Earnings A labor organization states that the median weekly 
earnings of male workers is greater than $798. To test this claim, you 
randomly select 70 male workers and ask each to provide his weekly 
earnings. The results are shown in the table. At a = 0.01, can you support the 
organization’s claim? (Adapted from U.S. Bureau of Labor Statistics) 


29. Ages of Brides A marriage counselor estimates that the median age of 
brides at the time of their first marriage is less than or equal to 26 years. In 
a random sample of 65 brides, 24 are less than 26 years old, 35 are more than 
26 years old, and 6 are 26 years old. At a = 0.05, can you reject the 
counselor’s claim? (Adapted from U.S. Census Bureau) 


30. Ages of Grooms A marriage counselor estimates that the median age of 
grooms at the time of their first marriage is greater than 28 years. In a 
random sample of 56 grooms, 33 are less than 28 years old, 23 are more than 
28 years old, and none are 28 years old. At a = 0.05, can you support the 
counselor’s claim? (Adapted from U.S. Census Bureau) 
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» How to use the Wilcoxon 
signed-rank test to determine 
if two dependent samples 
are selected from populations 
having the same distribution 


Vv 


How to use the Wilcoxon rank 
sum test to determine if two 
independent samples are 
selected from populations 
having the same distribution 


STUDY TIP 


The absolute value of 
a number is its value, 
disregarding its sign. 
A pair of vertical bars, 
, is used to denote 
absolute value. For 
example, |3| = 3 and 
|-7| = 7. 
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The Wilcoxon Tests 


WHAT YOU SHOULD LEARN 


The Wilcoxon Signed-Rank Test » The Wilcoxon Rank Sum Test 
> THE WILCOXON SIGNED-RANK TEST 


In this section, you will study the Wilcoxon signed-rank test and the Wilcoxon 
rank sum test. Unlike the sign test from Section 11.1, the strength of these two 
nonparametric tests is that each considers the magnitude, or size, of the data 
entries. 

In Section 8.3, you used a f-test together with dependent samples to 
determine whether there was a difference between two populations. To use the 
t-test to test such a difference, you must assume (or know) that the dependent 
samples are randomly selected from populations having a normal distribution. 
But, what if this assumption cannot be made? Instead of using the two-sample 
t-test, you can use the Wilcoxon signed-rank test. 


DEFINITION 


The Wilcoxon signed-rank test is a nonparametric test that can be used to 
determine whether two dependent samples were selected from populations 
having the same distribution. 


GUIDELINES 
Performing a Wilcoxon Signed-Rank Test 
IN WORDS 


1. Identify the claim. State the null 
and alternative hypotheses. 


IN SYMBOLS 
State Hp and H,. 


2. Specify the level of significance. Identify a. 


3. Determine the sample size n, 
which is the number of pairs 
of data for which the difference 
is not 0. 


Use Table 9 in Appendix B. 


Headers: Sample 1, 
Sample 2, Difference, 
Absolute value, Rank, 
and Signed rank. Signed 
rank takes on the same 
sign as its corresponding 
difference. 


4. Determine the critical value. 
5. Find the test statistic w,. 
a. Complete a table using the 
headers listed at the right. 
b. Find the sum of the positive ranks 
and the sum of the negative ranks. 
c. Select the smaller absolute 
value of the sums. 
If w, is less than or equal 
to the critical value, reject 
Hy. Otherwise, fail to 
reject Hy. 


6. Make a decision to reject or fail to 
reject the null hypothesis. 


7. Interpret the decision in the context 
of the original claim. 
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STUDY TIP 


Do not assign a rank to any 
difference of 0. In the case of a 
tie between data entries, use the 
average of the corresponding 
ranks. For instance, if two data 
entries are tied for the fifth rank, 
use the average of 5 and 6, 
which is 5.5, as the rank for 

both entries. The next data 
entry will be assigned a 
rank of 7, not 6. 


If three entries are tied 
for the fifth rank, use the 
average of 5, 6, and 7, 
which is 6, as the rank 
for all three data entries. 
The next data entry will 
be assigned a rank of 8. 
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EXAMPLE 1 


> Performing a Wilcoxon Signed-Rank Test 


A golf club manufacturer believes that golfers can lower their scores by using 
the manufacturer’s newly designed golf clubs. The scores of 10 golfers while 
using the old design and while using the new design are shown in the table. At 
a = 0.05, can you support the manufacturer’s claim? 


> Solution 
The claim is “golfers can lower their scores.” To test this claim, use the 
following null and alternative hypotheses. 

Hy: The new design does not lower scores. 

H,: The new design lowers scores. (Claim) 
This Wilcoxon signed-rank test is a one-tailed test with a = 0.05, and because 
one data pair has a difference of 0, n = 9 instead of 10. From Table 9 in 


Appendix B, the critical value is 8.To find the test statistic w,, complete a table 
as shown below. 


89 83 6 


6 8 8 
84 83 il 1 1 1 
96 92 4 4 5.5 5.5 
74 76 —2 2 25 =2.5 
91 91 0 0 —_— —_— 
85 80 > 5 7 7 
95 87 8 8 9 9 
82 85 =3 3 4 —4 
92 90 2 2 25 2.5, 
81 77 4 4 5.5 5.5 


The sum of the negative ranks is 
—2.5 + (—4) = -6.5. 

The sum of the positive ranks is 
84+14+554+7+4+9+425 + 5.5 = 385. 


The test statistic is the smaller absolute value of these two sums. Because 
|-6.5| < |38.5|, the test statistic is w, = 6.5. Because the test statistic is less 
than the critical value, that is, 6.5 < 8, you should decide to reject the null 
hypothesis. 


Interpretation ‘There is enough evidence at the 5% level of significance to 
support the claim that golfers can lower their scores by using the newly 
designed clubs. 
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To help determine when knee 
arthroscopy patients can resume 
driving after surgery, the driving 
reaction times (in milliseconds) 
of 10 right knee arthroscopy 
patients were measured before 
surgery and 4 weeks after surgery 
using a computer-linked car 
simulator. The results are shown 
in the table. (Adapted from Knee Surgery, 
Sports Traumatology, Arthroscopy Journal) 


720 730 
750 645 
735 745 
730 640 
755 660 
745 670 
730 650 
725 730 
770 675 
700 705 


Oo mWAN DN FP WN FR 


PR 
j=) 


At a = 0.05, can you conclude 
that the reaction times changed 
significantly four weeks after 
surgery? 


STUDY TIP 


Use the Wilcoxon 
signed-rank test for 
dependent samples and 
the Wilcoxon rank sum 
test for independent 
samples. 
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> Try It Yourself 1 


A quality control inspector wants to test the claim that a spray-on water 
repellent is effective. To test this claim, he selects 12 pieces of fabric, sprays 
water on each, and measures the amount of water repelled (in milliliters). He 
then applies the water repellent and repeats the experiment. The results are 
shown in the table. At a = 0.01, can he conclude that the water repellent 
is effective? 


12 11,6, +6;)8 #8) 6 12) 8 14) 8 


. Identify the claim and state Hp) and Hy. 

. Specify the level of significance a. 

Determine the sample size n. 

. Determine the critical value. 

Find the test statistic w, by making a table, finding the sum of the positive 
ranks and the sum of the negative ranks, and finding the absolute value of 
each. 

Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A47 


cancap 


cm 


> THE WILCOXON RANK SUM TEST 


In Sections 8.1 and 8.2, you used a z-test or a f-test together with independent 
samples to determine whether there was a difference between two populations. 
To use the z-test to test such a difference, you must assume (or know) that the 
independent samples are randomly selected and that either each sample size is at 
least 30 or each population has a normal distribution with a known standard 
deviation. To use the f-test to test such a difference, you must assume (or know) 
that the independent samples are randomly selected from populations having a 
normal distribution. But, what if these assumptions cannot be made? You can 
still compare the populations using the Wilcoxon rank sum test. 


DEFINITION 


The Wilcoxon rank sum test is a nonparametric test that can be used to 
determine whether two independent samples were selected from populations 
having the same distribution. 


A requirement for the Wilcoxon rank sum test is that the sample sizes of both 
samples must be at least 10. When calculating the test statistic for the Wilcoxon 
rank sum test, let n, represent the sample size of the smaller sample and n 
represent the sample size of the larger sample. If the two samples have the same 
size, it does not matter which one is 1 or 7. 

When calculating the sum of the ranks R, combine both samples and rank the 
combined data. Then sum the ranks for the smaller of the two samples. If the two 
samples have the same size, you can use the ranks from either sample, but you 
must use the ranks from the sample you associate with n,. 
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TEST STATISTIC FOR THE WILCOXON RANK SUM TEST 


Given two independent samples, the test statistic z for the Wilcoxon rank sum 


test is 
R—- pr 
YG —. 
OR 
where 


R = sum of the ranks for the smaller sample, 


n(n Gl Ny af 1) 
MRT >) > 


and 


NyNy(Ny che Ny ote 1) 
OF = . 


12 


GUIDELINES 


Performing a Wilcoxon Rank Sum Test 


IN WORDS IN SYMBOLS 

1. Identify the claim. State the null and State Hp and H,. 
alternative hypotheses. 

2. Specify the level of significance. Identify a. 

3. Determine the critical value(s) Use Table 4 in Appendix B. 
and the rejection region(s). 

4. Determine the sample sizes. ny =n 

5. Find the sum of the ranks for the R 


smaller sample. 


a. List the combined data in 
ascending order. 
b. Rank the combined data. 


c. Add the sum of the ranks for 
the smaller sample, 1. 


g go R—- BR 
6. Find the test statistic and sketch B= 
the sampling distribution. OR 
7. Make a decision to reject or fail If z is in the rejection 
to reject the null hypothesis. region, reject Hp. 
Otherwise, fail to 
reject Hp. 


8. Interpret the decision in the 
context of the original claim. 
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EXAMPLE 2 


> Performing a Wilcoxon Rank Sum Test 


The table shows the earnings (in thousands of dollars) of a random sample of 
10 male and 12 female pharmaceutical sales representatives. At a = 0.10, 
can you conclude that there is a difference between the males’ and females’ 
earnings? 


| Male earnings 78 93 114 101 98 94 86 95 117 99 


86 77 101 93 85 | 98 91 87 84 | 97 100 90 


> Solution 


The claim is “there is a difference between the males’ and females’ earnings.” 
The null and alternative hypotheses for this test are as follows. 


Hy: There is no difference between the males’ and the females’ earnings. 


H,: There is a difference between the males’ and the females’ earnings. 
(Claim) 


Because the test is a two-tailed test with a = 0.10, the critical values are 
—Zy = —1.645 and zy = 1.645. The rejection regions are z < —1.645 and 
z> 1.645. 


The sample size for men is 10 and the sample size for women is 12. Because 
10 < 12, nm, = 10 and nz = 12. Before calculating the test statistic, you must 
find the values of R, wr, and ap. The table shows the combined data listed in 
ascending order and the corresponding ranks. 


STUDY TIP "Ordered data Sample Rank Ordered data Sample Rank 

Remember that in the 77 F 1 94 M 12 

case of a tie between 78 M 2: 95 M 13 

data entries, use the 84 rE 3 97 F 14 

average of the 

corresponding ranks. 85 i 4 98 M 155 
86 M 5.5 98 F 15.5 
86 F DD 99 M 17 
87 F 7 100 F 18 
90 F 8 101 M 19.5 
91 F 9 101 F 19.5 
93 M 10.5 114 M 21 
93 F 10.5 117 M 22 


Because the smaller sample is the sample of males, R is the sum of the male 
rankings. 


R=2+55+4+105 + 12+ 134+ 15.5 4+ 17 + 19.5 + 21 4+ 22 
= 138 
Using n, = 10 and n, = 12, youcan find pp and op as follows. 
mim +nm+1) 101104+12+1) 23 


0 
cE 2 2 . 
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ae ny + nm + 1) 


2 


3 


{2 
He 


15.17 
When R = 138, wr = 115, and op ~ 15.17, the test statistic is 


N 


2 


_ R- wR 
OR 


_, 138 — 115 
1iSd7 


1.52. 


ru 


From the graph at the right, you can see 
that the test statistic z is not in the 
rejection region. So, you should decide to 
fail to reject the null hypothesis. 


Interpretation There is not enough 
evidence at the 10% level of significance 
to conclude that there is a difference 3 / -l1 0 1 : \2 3 
between the males’ and females’ earnings. —% = — 1.645 


z=1. 


> Try It Yourself 2 


You are investigating the automobile insurance claims paid (in thousands of 
dollars) by two insurance companies. The table shows a random, independent 
sample of 12 claims paid by the two insurance companies. At a = 0.05, can you 
conclude that there is a difference in the claims paid by the companies? 


62 | 106) 2.5 4.5 6.5 7.4 
43 5.6 3.4 1.8 2.2 4.7 


9.9 3.0 5.8 3.9 6.0 6.3 


10.8 41 1.7 3.0 4.4 Bye) 


. Identify the claim and state Hy and H,. 
. Specify the level of significance a. 
. Determine the critical value(s) and the rejection region(s). 
. Determine the sample sizes n, and np. 
. List the combined data in ascending order, rank the data, and find the sum 
of the ranks of the smaller sample. 
Find the test statistic z. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 
Answer: Page A47 


ce 


bad 


= oo 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. How do you know whether to use a Wilcoxon signed-rank test or a Wilcoxon 
rank sum test? 


2. What is the requirement for the sample size of both samples when using the 
Wilcoxon rank sum test? 


M@ USING AND INTERPRETING CONCEPTS 


Performing a Wilcoxon Test Jn Exercises 3-8, 


(a) identify the claim and state Hy and Hy,. 

(b) decide whether to use a Wilcoxon signed-rank test or a Wilcoxon rank sum test. 
(c) determine the critical value(s). 

(d) find the test statistic. 

(e) decide whether to reject or fail to reject the null hypothesis. 


(f) interpret the decision in the context of the original claim. 


* 3. Calcium Supplements and Blood Pressure In a study testing the effects 
of calcium supplements on blood pressure in men, 12 men were randomly 
chosen and given a calcium supplement for 12 weeks. The measurements 
shown in the table are for each subject’s diastolic blood pressure taken 
before and after the 12-week treatment period. At a = 0.01, can you 
reject the claim that there was no reduction in diastolic blood pressure? 
(Adapted from The Journal of the American Medical Association) 


135 124) «118 130s: 115 
122. 120 126 128 106 


*. 4. Wholesale Trade and Manufacturing A private industry analyst claims 
that there is no difference in the salaries earned by workers in 
the wholesale trade and manufacturing industries. A random sample of 
10 wholesale trade and 10 manufacturing workers and their salaries 
(in thousands of dollars) are shown in the table. At a = 0.10, can you 
reject the analyst’s claim? (Adapted from U.S. Bureau of Economic Analysis) 


— Banna 


62 58 47 65 | 45 56 67 49 55 43 
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5. Drug Prices A researcher wants to determine whether the cost of 


gt 
>» 


ue 


i 


prescription drugs is lower in Canada than in the United States. The 
researcher selects seven of the most popular brand-name prescription drugs 
and records the cost per pill (in U.S. dollars) of each. The results are shown 
in the table. At a = 0.05, can the researcher conclude that the cost of 
prescription drugs is lower in Canada than in the United States? (Adapted 
from Annals of Internal Medicine) 


1 2 3 4 5 6 7 
1.26 1.76 | 4.19 336 1.80 9.91 3.95 


1.04 0.82 2.22 2.22. 1.31 | 11.47 2.63 


6. Earnings by Degree A college administrator believes that there is a 


difference in the earnings of people with bachelor’s degrees and those 
with associate’s degrees. The table shows the earnings (in thousands of 
dollars) of a random sample of 11 people with bachelor’s degrees and 
10 people with associate’s degrees. At a = 0.05, is there enough evidence 
to support the administrator’s belief? (Adapted from U.S. Census Bureau) 


54.50 63 76 70 50) 44 56 60 52 = 54 
36 39) «47, 33 38 | 38 | 45 45 42334 


. Teacher Salaries A teacher’s union representative claims that there is a 


difference in the salaries earned by teachers in Wisconsin and Michigan. 
The table shows the salaries (in thousands of dollars) of a random 
sample of 11 teachers from Wisconsin and 12 teachers from Michigan. 
At a = 0.05, is there enough evidence to support the representative’s 
claim? (Adapted from National Education Association) 


S159 «52 46) S51 | 55 53 SL 50 5064 
57. 61 51) 58 | 53 | 63) | 57 63 | 55 49 | 54 72 


. Heart Rate A physician wants to determine whether an experimental 


medication affects an individual’s heart rate. The physician selects 
15 patients and measures the heart rate of each. The subjects then take 
the medication and have their heart rates measured after one hour. The 
results are shown in the table. At a = 0.05, can the physician conclude 
that the experimental medication affects an individual’s heart rate? 


81 75 76) 79 74 65 67 
80 75 79 | 74 76 73 67 
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M@ EXTENDING CONCEPTS 


Wilcoxon Signed-Rank Test for n > 30 = /f you are performing a Wilcoxon 
signed-rank test and the sample size n is greater than 30, you can use the Standard 
Normal Table and the following formula to find the test statistic. 


n(n + 1) 
a 
ee Tan Dn + 1 
' 24 


In Exercises 9 and 10, perform the indicated Wilcoxon signed-rank test using the 
test statistic for n > 30. 


‘. 9. Fuel Additive A petroleum engineer wants to know whether a certain 
fuel additive improves a car’s gas mileage. To decide, the engineer records 
the gas mileages (in miles per gallon) of 33 cars with and without the 
additive. The results are shown in the table. At a = 0.10, can the engineer 
conclude that the gas mileage is improved? 


2 3 4 5 6 7 8 9 10 it 
36.4 | 36.6 | 36.6 36.8 36.9 | 37.0 | 37.1 37.2 | 37.2 | 36.7 
36.9 37.0 37.5 38.0 38.1 38.4 | 38.7 | 38.8 38.9 | 36.3 


13 | 14) 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 
31.6 378 37.9 379 38.1 384 40.2 40.5 40.9 35.0 


39.0 39.1 | 39.4 39.4 39.5 398/400 40.0 401 36.3 


23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 
32.7 | 33.6 | 34.2 | 35.1 | 35.2 | 35.3 | 35.5 | 35.9 | 36.0 | 36.1 | 37.2 
32.8 | 34.2 34.7 | 34.9 34.9 | 35.3 35.9 | 36.4 | 36.6 | 36.6 38.3 


. 10. Fuel Additive A petroleum engineer claims that a fuel additive 
improves gas mileage. The table shows the gas mileages (in miles per 
gallon) of 32 cars measured with and without the fuel additive. Test the 
petroleum engineer’s claim at a = 0.05. 

2/3/4f]/s5l/6]7/s8|9]1w0/] 1 

| 34.2 34.4 344 34.6 34.8 | 35.6 35.7 30.2 31.6 32.3 

36.7 | 37.2 | 37.2 | 37.3 | 37.4 | 37.6 | 37.7 | 34.2 | 34.9 | 34.9 


13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 
33.1 | 33.7 33.7 | 33.8 | 35.7 | 36.1 | 36.1 | 36.6 | 36.6 | 36.8 
| 
35.7 36.0 | 36.2 36.5 | 37.8 | 38.1 | 38.2 | 38.3 | 38.3 | 38.7 


24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 


37.1 37.1 | 37.2 37.9 37.9 38.0 | 38.0 38.4 38.8 42.1 


| 
38.8 38.9 39.1 39.1 39.2 39.4 39.8 | 40.3 40.8 43.2 
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College Ranks 


Each year, Forbes and the Center for College Affordability and Productivity release a list of the 
best colleges in America. Six hundred undergraduate colleges and universities are ranked according 
to quality of education, 4-year graduation rate, post-graduate success, average student debt after 
4 years, and number of students or faculty who have won competitive awards, such as Rhodes 


Scholarships or Nobel Prizes. 


The table shows freshman class size by state for randomly selected colleges on the 2009 list. 


CA 
236 
1703 
320 
382 
202 
202 
458 
252 
467 
574 


M™ EXERCISES 


Freshman Class Size 

MA NC PA 
540 3865 372 
1666 1699 327 
596 1201 | 366 
439 2073 | 588 
1048 2781~—«957 
2167 1291~— 453 
643 2492 | 2400 
754 3090 601 
1297 4538 613 
518 4804 399 


1. Construct a side-by-side box-and-whisker 
plot for the four states. Do any of the 
median freshman class sizes appear to be 
the same? Do any appear to be different? 


In Exercises 2-5, use the sign test to 


test the 


claim. What can you conclude? Use a = 0.05. 


2. The median freshman class size at a 
California college is less than or equal 


to 400. 


3. The median freshman class size at a 
Massachusetts college is greater than or 


equal to 750. 


4. The median freshman class size at a 
Pennsylvania college is 500. 


5. The median freshman class size at a North 
Carolina college is different from 2400. 


In Exercises 6 and 7, use the Wilcoxon rank sum 
test to test the claim. Use a = 0.01. 


6. 


There is no difference between freshman 
class sizes for Pennsylvania colleges and 
California colleges. 


. There is a difference between freshman 
class sizes for Massachusetts colleges and 
North Carolina colleges. 


WHAT YOU SHOULD LEARN 


> How to use the Kruskal-Wallis 
test to determine whether 
three or more samples were 
selected from populations 
having the same distribution 
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The Kruskal-Wallis Test 


The Kruskal-Wallis Test 
> THE KRUSKAL-WALLIS TEST 


In Section 10.4, you learned how to use one-way ANOVA techniques to compare 
the means of three or more populations. When using one-way ANOVA, you 
should verify that each independent sample is selected from a population that is 
normally, or approximately normally, distributed. If, however, you cannot verify 
that the populations are normal, you can still compare the distributions of three 
or more populations. To do so, you can use the Kruskal-Wallis test. 


DEFINITION 


The Kruskal-Wallis test is a nonparametric test that can be used to determine 
whether three or more independent samples were selected from populations 
having the same distribution. 


The null and alternative hypotheses for the Kruskal-Wallis test are as 
follows. 


HH: There is no difference in the distribution of the populations. 
H,: There is a difference in the distribution of the populations. 


Two conditions for using the Kruskal-Wallis test are that each sample must 
be randomly selected and the size of each sample must be at least 5. If these 
conditions are met, then the sampling distribution for the Kruskal-Wallis test is 
approximated by a chi-square distribution with k — 1 degrees of freedom, where 
k is the number of samples. You can calculate the Kruskal-Wallis test statistic 
using the following formula. 


TEST STATISTIC FOR THE KRUSKAL-WALLIS TEST 


Given three or more independent samples, the test statistic for the 
Kruskal-Wallis test is 


12 (# RS Ri, 


fae = |e SNe 
N(N + 1) ny Ny x | ( ) 


i 


where 


k represents the number of samples, 
n; is the size of the ith sample, 
N is the sum of the sample sizes, 


and 


R; is the sum of the ranks of the ith sample. 


Performing a Kruskal-Wallis test consists of combining and ranking the 
sample data. The data are then separated according to sample and the sum of the 
ranks of each sample is calculated. 
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These sums are then used to calculate the test statistic H, which is an 
approximation of the variance of the rank sums. If the samples are selected 
from populations having the same distribution, the sums of the ranks will be 
approximately equal, H will be small, and the null hypothesis should not be 
rejected. 

If, however, the samples are selected from populations not having the same 
distribution, the sums of the ranks will be quite different, H will be large, and the 
null hypothesis should be rejected. 

Because the null hypothesis is rejected only when #7 is significantly large, the 
Kruskal-Wallis test is always a right-tailed test. 


GUIDELINES 


Performing a Kruskal-Wallis Test 
IN WORDS 


1. Identify the claim. State 
the null and alternative 
hypotheses. 


IN SYMBOLS 
State Ho and H,. 


2. Specify the level of Identify a. 
significance. 


3. Identify the degrees Chit, =k = il 


of freedom. 


4. Determine the critical value 
and the rejection region. 


Use Table 6 in Appendix B. 


5. Find the sum of the ranks 
for each sample. 


a. List the combined data 
in ascending order. 


b. Rank the combined data. 
12 


. Find the test statistic and sketch 
the sampling distribution. 


. Make a decision to reject 
or fail to reject the null 
hypothesis. 


. Interpret the decision in 
the context of the original 
claim. 
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ny Ny Nk 
= BURT ar Il) 
If H is in the rejection region, 


reject Hy. Otherwise, fail to 
reject Hy. 
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EXAMPLE 1 G9 Report 51 


> Performing a Kruskal-Wallis Test 


‘You want to compare the number of crimes reported in three police precincts 
in a city. To do so, you randomly select 10 weeks for each precinct and record 
the number of crimes reported. The results are shown in the table. At a = 0.01, 
can you conclude that the distributions of crimes reported in the three police 
precincts are different? 


> Solution 


‘You want to test the claim that there is a difference in the number of crimes 
reported in the three precincts. The null and alternative hypotheses are as 
follows. 


Hy: There is no difference in the number of crimes reported in the three 
precincts. 


H,: There is a difference in the number of crimes reported in the three 
precincts. (Claim) 


The test is a right-tailed test with a = 0.01 and df.=kA-1=3-1=2. 
From Table 6, the critical value is yj = 9.210. Before calculating the test 
statistic, you must find the sum of the ranks for each sample. The table shows 
the combined data listed in ascending order and the corresponding ranks. 


44 101st 1 54 106th 11 62 113th 20.5 
45 101st 2 55 106th 12 63 113th 22 
48 101st 3 56 101st 13 64 106th 23 
49 101st 4 57 101st 14 65 106th 24.5 
50 101st 35 58 106th 15 65 113th 24.5, 
50 106th 30) 59 113th 16 66 106th 26 
51 113th 7 60 101st 17.5 67 113th 27 
52 101st 8.5 60 113th 17.5 69 113th 28 
52 101st 8.5 61 113th 19 70 106th 29.5 
53 106th 10 62 106th 20.5 70 113th 29.5 
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The following randomly collected 
data were used to compare the 
water temperatures (in degrees 
Fahrenheit) of cities bordering 
the Gulf of Mexico. (Adapted from 
National Oceanographic Data Center) 


51 
55 
57 
63 
74 
82 
85 
60 
64 
76 
83 
86 


At a = 0.05, can you conclude 
that the temperature 
distributions of the three 

cities are different? 


The sum of the ranks for each sample is as follows. 
R,=14+24+34+44+554+85 485 +4134 144+ 175 =77 
Ro =5.5 +104 11 + 12 + 15 + 205 + 23 + 24.5 + 26 + 29.5 = 177 
R;= 7+ 16+ 17.5 + 19 + 20.5 + 22 + 24.5 + 27 + 28 + 29.5 = 211 


Using these sums and the values n,; = 10, n> = 10, n3 = 10, and N = 30, the 

test statistic is 
7 12 (Zz 17 . 210" 
30(30 + 1) \ 10 10 10 


) — 3(30 + 1) © 12.521. 


From the graph at the right, you can see} 
that the test statistic H is in the rejection 
region. So, you should decide to reject the 
null hypothesis. 


Interpretation There is enough evidence 
at the 1% level of significance to support 


the claim that there is a difference in the a=0.01 
number of crimes reported in the three - ; 
police precincts. 3 4 6 8 / 10 12\ 14 
He 17521 
4 = 9210 


> Try It Yourself 1 


You want to compare the salaries of 
veterinarians who work in California, 
New York, and Pennsylvania. To 
compare the salaries, you randomly 


select several veterinarians in each 99,95 94.40 99.20 

eons oe gece ee one 97.50 99.75 103.70 

salaries (in thousands of dollars) are 

: : 98.85 97.50 110.45 

listed in the table. At a = 0.05, can 

you conclude that the distributions 100.75 101.97 95.15 

of the veterinarians’ salaries in these 101.20 93.10 88.80 

three states are different? (Adapted 96.25 102.35 99.99 

from U.S. Bureau of Labor Statistics) 99.70 97.89 100.55 
88.28 92.50 97.25 
113.90 101.55 97.44 
103.20 


. Identify the claim and state Hy and H,. 
. Specify the level of significance a. 
. Identify the degrees of freedom. 
. Determine the critical value and the rejection region. 
. List the combined data in ascending order, rank the data, and find the sum 
of the ranks of each sample. 
Find the test statistic H. Sketch a graph. 
. Decide whether to reject the null hypothesis. 
. Interpret the decision in the context of the original claim. 
Answer: Page A48 


ce 


ba 


= 7 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What are the conditions for using a Kruskal-Wallis test? 


2. Explain why the Kruskal-Wallis test is always a right-tailed test. 


M@ USING AND INTERPRETING CONCEPTS 


Performing a Kruskal-Wallis Test In Exercises 3-6, (a) identify the claim 
and state Hy and H,, (b) determine the critical value, (c) find the sums of the ranks 
for each sample and calculate the test statistic, (d) decide whether to reject or fail 
to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. 


r. 


4 
Y 


oe 


3. Home Insurance The table shows the annual premiums for a random 


sample of home insurance policies in Connecticut, Massachusetts, and 
Virginia. At a = 0.05, can you conclude that the distributions of the 
annual premiums in these three states are different? (Adapted from 
National Association of Insurance Commissioners) 


Connecticut 930 725 890 | 1040 1165 | 806 947 
Massachusetts 1105 1025 980 1295 1110 889 757 
Virginia 815 730 546 625 912 618 535 


. Hourly Rates A researcher wants to determine whether there is a 


difference in the hourly pay rates for registered nurses in three states: 
Indiana, Kentucky, and Ohio. The researcher randomly selects several 
registered nurses in each state and records the hourly pay rate for each 
in the table shown. At a = 0.05, can the researcher conclude that the 
distributions of the registered nurses’ hourly pay rates in these three 
states are different? (Adapted from U.S. Bureau of Labor Statistics) 


Indiana 27.80 28.25 | 26.65 | 27.40 | 30.24 25.10 29.44 
Kentucky | 26.95 25.58 28.10 | 30.20 | 28.55 31.60 | 24.60 
Ohio 25.75 | 30.15 | 31.55 | 31.82 | 25.25 | 27.80 


- Annual Salaries The table shows the annual salaries for a random 


sample of workers in Kentucky, North Carolina, South Carolina, and 
West Virginia. At a = 0.10, can you conclude that the distributions of the 
annual salaries in these four states are different? (Adapted from U.S. Bureau 
of Labor Statistics) 


Kentucky 32.5 | 34.2 | 43.1 54.7. 30.9 | 25.5 
North Carolina 40.5 38.9 33.6 51.3 32.5 | 36.6 
South Carolina 27.8 35.4 41.5 40.9 32.7) 34.1 
West Virginia 27.1 | 38.2 | 28.9 37.4 42.6 | 30.4 
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‘". 6. Caffeine Content The table shows the amounts of caffeine (in milligrams) 
in 16-ounce servings for a random sample of beverages. At a = 0.01, can 
you conclude that the distributions of the amounts of caffeine in these four 
beverages are different? (Source: Center for Science in the Public Interest) 


Coffees 320 300 206 = 150 | 266 

Soft drinks 95 96 56 51 71 72 47 
Energy drinks | 200 141 160 152 154 | 166 

Teas 100 106 42 15 32 10 


@® In Exercises 7 and 8, use StatCrunch and the table at the left, which shows the 
number of job offers received by mechanical engineers who recently graduated 
from four colleges (A, B, C, D). 


5 8 5 2 7. At a = 0.01, can you conclude that the distributions of the number of job 
4 10 4 3 offers at Colleges A, B, and C are different? 

7 ? 3 5 8. At a = 0.01, can you conclude that the distributions of the number of job 
6 | 7/5 | 4 offers at all four colleges are different? 

5 10 7 2 

+ 6 8 3 


mM EXTENDING CONCEPTS 


Comparing Two Tests In Exercises 9 and 10, perform the indicated test 
using (a) a Kruskal-Wallis test and (b) a one-way ANOVA test, assuming that each 
population is normally distributed and the population variances are equal. 
Compare the results. If convenient, use technology to solve the problem. 


TABLE FOR EXERCISES 7 AND 8 


‘‘. 9. Hospital Patient Stays An insurance underwriter reports that the mean 
number of days patients spend in a hospital differs according to the 
region of the United States in which the patient lives. The table shows 
the number of days randomly selected patients spent in a hospital in four 
US. regions. At a = 0.01, can you support the underwriter’s claim? 
(Adapted from U.S. National Center for Health Statistics) 


Northeast 8 6 6,3 5 113) 8 6 
Midwest 5/}4/3;);9]1]4/)6/3)4)]7 
South 5$}8])/1/)5]8]7)5)]1 

West 2/3}6/6/5)]4;,3)]6/)/5 


‘, 10. Energy Consumption The table shows the energy consumed 
(in millions of Btu) in one year for a random sample of households 
from four U.S. regions. At a = 0.01, can you conclude that the 
mean energy consumptions are different? (Adapted from U.S. Energy 
Information Administration) 


Northeast 72 106 151 138 104 108 95 134, 100° 174 
Midwest 84 183 194 165 120) 212 148 129 113 62 97 
South 91 40 72, 91 147 74) 70 67 

West 74. 32 78 28 106) 39 118 63> 70. 56 
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» How to use the Spearman 
rank correlation coefficient 
to determine whether the 
correlation between two 
variables is significant 
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Rank Correlation 


The Spearman Rank Correlation Coefficient 


> THE SPEARMAN RANK CORRELATION COEFFICIENT 


In Section 9.1, you learned how to measure the strength of the relationship 
between two variables using the Pearson correlation coefficient r. Two 
requirements for the Pearson correlation coefficient are that the variables are 
linearly related and that the population represented by each variable is normally 
distributed. If these requirements cannot be met, you can examine the relationship 
between two variables using the nonparametric equivalent to the Pearson 
correlation coefficient—the Spearman rank correlation coefficient. 

The Spearman rank correlation coefficient has several advantages over the 
Pearson correlation coefficient. For instance, the Spearman rank correlation 
coefficient can be used to describe the relationship between linear or nonlinear 
data. The Spearman rank correlation coefficient can be used for data at the ordinal 
level. And, the Spearman rank correlation coefficient is easier to calculate by 
hand than the Pearson coefficient. 


DEFINITION 


The Spearman rank correlation coefficient r, is a measure of the strength of 
the relationship between two variables. The Spearman rank correlation 
coefficient is calculated using the ranks of paired sample data entries. If there 
are no ties in the ranks of either variable, then the formula for the Spearman 
rank correlation coefficient is 


_ 63d? 


- n(n? — 1) 


where n is the number of paired data entries and d is the difference between 
the ranks of a paired data entry. If there are ties in the ranks and the number 
of ties is small relative to the number of data pairs, then the formula can still 
be used to approximate 7,. 


The values of r, range from —1 to 1, inclusive. If the ranks of corresponding 
data pairs are exactly identical, r, is equal to 1. If the ranks are in “reverse” order, 
r, is equal to —1. If the ranks of corresponding data pairs have no relationship, r, is 
equal to 0. 

After calculating the Spearman rank correlation coefficient, you can 
determine whether the correlation between the variables is significant. You can 
make this determination by performing a hypothesis test for the population 
correlation coefficient p,. The null and alternative hypotheses for this test are as 
follows. 


Ho: ps = 0 (There is no correlation between the variables.) 
H,: p; # 0 (There is a significant correlation between the variables.) 


The critical values for the Spearman rank correlation coefficient are listed in 
Table 10 of Appendix B. Table 10 lists critical values for selected levels of 
significance and for sample sizes of 30 or less. The test statistic for the hypothesis 
test is the Spearman rank correlation coefficient r,. 
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GUIDELINES 
Testing the Significance of the Spearman Rank Correlation Coefficient 


IN WORDS IN SYMBOLS 
1. State the null and alternative State Hp and H,. 
hypotheses. 
2. Specify the level of significance. Identify a. 
3. Determine the critical value. Use Table 10 in 
Appendix B. 
2 
4. Find the test statistic. R= il= one 
n(n — 1) 
5. Make a decision to reject or fail to If |r,| is greater than 
reject the null hypothesis. the critical value, reject 
Hy. Otherwise, fail to 
reject Hy. 


6. Interpret the decision in the context 
of the original claim. 


EXAMPLE 1 


>» The Spearman Rank Correlation Coefficient 


The table shows the school enrollments (in millions) at all levels of education 
for males and females from 2000 to 2007. At a = 0.05, can you conclude that 
there is a correlation between the number of males and females enrolled in 
school? (Source: U.S. Census Bureau) 


2000 35.8 36.4 
2001 36.3 36.9 
2002 36.8 Sh 
2003 37.3 37.6 
2004 37.4 38.0 
2005 37.4 38.4 
2006 37.2 38.0 
2007 37.6 38.4 


> Solution 
The null and alternative hypotheses are as follows. 


Hy: p; = 0 (There is no correlation between the number of males and 
females enrolled in school.) 


H,: ps; 0 (There is a correlation between the number of males and 
females enrolled in school.) (Claim) 
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Each data set has eight entries. From Table 10 with a = 0.05 and n = 8, the 
STUDY TIP critical value is 0.738. Before calculating the test statistic, you must find Sd’, 
Remember that in the the sum of the squares of the differences of the ranks of the data sets. You can 
case of a tie between use a table to calculate d7, as shown below. 
data entries, use the 
average of the 
corresponding ranks. 


35.8 1 36.4 1 0 0 
36.3 2 36.9 2 0 0 
36.8 3 37.3 3 0 0 
37.3 5 37.6 4 1 i 
37.4 6.5 38.0 5.5 il il 
ETT are 37.4 6.5 38.4 a5 = 1 
e table shows the retail prices 
(in dollars per pound) for 100% 312 38.0 5.5 15 2.25 
ground beef and fresh whole 37.6 8 38.4 75 0.5 0.25 
chicken in the United States from Se = 55 
2000 to 2008. (Source: U.S. Bureau of : 
Labor Statistics) 
When n = 8 and Sd? = 5.5, the test statistic is 
Year Beef Chicken | oy 88a? 
2000 1.63 1.08 * n(n i 1) 
2001 Ball 1.11 6(5.5) 
2002 1.69 1.05 =1- 3(82 i 
2003 223 1.05 ( ) 
2004 2.14 1.03 ~ 0.935. 
2005 2.30 1.06 Because |0.935| > 0.738, you should reject the null hypothesis. 
2006 2.26 1.06 Interpretation ‘There is enough evidence at the 5% level of significance to 
2007 2.23 1.17 conclude that there is a correlation between the number of males and females 
2008 2.41 1.31 enrolled in school. 
Does a correlation exist between > Try It Yourself 1 
ground beef and chicken prices The table shows the number of males and females (in thousands) who received 
in the United States from 2000 their doctoral degrees from 2001 to 2007. At a = 0.01, can you conclude that 
to 2008? Use a = 0.10. there is a correlation between the number of males and females who received 


doctoral degrees? (Source: U.S. National Center for Education Statistics) 


2001 2002 2003 2004 | 2005 2006 2007 


25 24 24 25 27 29 30 
20 20 22 23 26 27 30 


. State the null and alternative hypotheses. 

. Specify the level of significance a. 

. Determine the critical value. 

. Use a table to calculate Sd’. 

. Find the fest statistic r,. 

Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A48 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. What are some advantages of the Spearman rank correlation coefficient over 
the Pearson correlation coefficient? 


2. Describe the ranges of the Spearman rank correlation coefficient and the 
Pearson correlation coefficient. 


3. What does it mean when r, is equal to 1? What does it mean when r, is equal 
to —1? What does it mean when ,, is equal to 0? 


4. Explain, in your own words, what r, and p, represent in Example 1. 


M@ USING AND INTERPRETING CONCEPTS 


Testing a Claim In Exercises 5—8, (a) identify the claim and state Hy and H,, 
(b) determine the critical value using Table 10 in Appendix B, (c) find the 
test statistic (d) decide whether to reject or fail to reject the null hypothesis, and 
(e) interpret the decision in the context of the original claim. 


5. Farming: Debt and Income In an agricultural report, a commodities analyst 
suggests that there is a correlation between debt and income in the farming 
business. The table shows the total debts and total incomes for farms in seven 
states for a recent year. At a = 0.01, is there enough evidence to support the 
analyst’s claim? (Adapted from U.S. Department of Agriculture) 


California 19,955 28,926 
Illinois 10,480 8,630 
Iowa 14,434 12,942 
Minnesota 9,982 8,807 
Nebraska 10,085 11,028 
North Carolina 4,235 7,008 
Texas 13,286 15,268 


* 6. Exercise Machines The table shows the overall scores and the prices 

for 11 different models of elliptical exercise machines. The overall score 
represents the ergonomics, exercise range, ease of use, construction, 
heart-rate monitoring, and safety. At a = 0.05, can you conclude that 
there is a correlation between the overall score and the price? (Source: 
Consumer Reports) 


85 78 77 75 73 71 
2600 | 2800 | 3700 1700 | 1300 | 900 


66 66 64 62 58 
1000 1400 1800 1000 700 
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7. Crop Prices The table shows the prices (in dollars per bushel) received 
by U.S. farmers for oat and wheat from 2000 to 2008. At a = 0.01, can you 
conclude that there is a correlation between the oat and wheat prices? 
(Source: U.S. Department of Agriculture) 


2000 2001 2002 | 2003 2004 2005 2006 2007 2008 
1.10 159 1.81 148 148 163 187 263 |) 3.10 
2.62 2.78 3.56 | 340 340 3.42 4.26 648 6.80 


* 8. Vacuum Cleaners The table shows the overall scores and the prices for 

12 different models of vacuum cleaners. The overall score represents 
carpet and bare-floor cleaning, airflow, handling, noise, and emissions. 
At a = 0.10, can you conclude that there is a correlation between the 
overall score and the price? (Source: Consumer Reports) 


73 65 60 71 62 39 
230 400 600 350 100 300 


67 64 68 60 70 55 
600 700 | 140 200 80 300 


* Test Scores and GNI Jn Exercises 9-12, use the following table. The table 
shows the average achievement scores of I5-year-olds in science and 
mathematics along with the gross national income (GNI) of nine countries for 
a recent year. (The GNI is a measure of the total value of goods and services 
produced by the economy of a country.) (Adapted from Organization for 
Economic Cooperation and Development; The World Bank) 


Canada 534 527 1307 
France 495 496 2467 
Germany 516 504 3207 
Italy 475 462 1988 
Japan 531 523 4829 
Mexico 410 406 989 
Spain 488 480 1314 
Sweden 503 502 438 
United States 489 474 13,886 


9. Science and GNI At a= 0.05, can you conclude that there is a 
correlation between science achievement scores and GNI? 


10. Math and GNI At a = 0.05, can you conclude that there is a correlation 
between mathematics achievement scores and GNI? 


11. Science and Math Ata = 0.05, can you conclude that there is a correlation 
between science and mathematics achievement scores? 


12. Writing aSummary Use the results from Exercises 9-11 to write a summary 
about the correlation (or lack of correlation) between test scores and GNI. 
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M@ EXTENDING CONCEPTS 


Testing the Rank Correlation Coefficient for n > 30 /f you are testing 
the significance of the Spearman rank correlation coefficient and the sample size n 
is greater than 30, you can use the following expression to find the critical value. 


+z 
Vn—- 1 


In Exercises 13 and 14, perform the indicated test. 


, Z corresponds to the level of significance 


"13. Work Injuries The table shows the average hours worked per week and 
the number of on-the-job injuries for a random sample of U.S. industries 
in a recent year. At a = 0.05, can you conclude that there is a correlation 
between average hours worked and the number of on-the-job injuries? 
(Adapted from U.S. Bureau of Labor Statistics; National Safety Council) 


* 14. Work Injuries in Construction The table shows the average hours 

worked per week and the number of on-the-job injuries for a random 
sample of U.S. construction companies in a recent year. At a = 0.05, 
can you conclude that there is a correlation between average hours 
worked and the number of on-the-job injuries? (Adapted from U.S. 
Bureau of Labor Statistics; National Safety Council) 


40.5 38.3 37.8 38.2 38.6 41.2 39.0 41.0 40.6 44.1 39.7) 41.2 


12 | 13 | 19 | 18 | 22 | 22 | 17 | 13 | 15 | 10 |] 18 | 19 


41.1 38.2 42.3 39.2 36.1 36.2 38.7 36.0 37.3 36.5 37.9 38.0 


13.) 24 | 12 | 12 | 13 | 15 | 18 | 11.) 24 | 16) 13 | 23 


36.7 40.1 35.5 38.2 42.3 | 39.0 39.6 | 39.1 39.6 39.1 


14-10 5 14 |} 13 | 18 | 15 | 23 | 15 | 23 
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The Runs Test 


WHAT YOU SHOULD LEARN The Runs Test for Randomness 


prea ye eee eunnaenecon rs >» THE RUNS TEST FOR RANDOMNESS 


determine whether a data set In obtaining a sample of data, it is important for the data to be selected randomly. 
is random But how do you know if the sample data are truly random? One way to test for 
randomness in a data set is to use a runs test for randomness. 
Before using a runs test for randomness, you must first know how to 
determine the number of runs in a data set. 


DEFINITION 


A run is a sequence of data having the same characteristic. Each run is 
preceded by and followed by data with a different characteristic or by no data 
at all. The number of data in a run is called the length of the run. 


EXAMPLE 1 


> Finding the Number of Runs 


A liquid-dispensing machine has been designed to fill one-liter bottles. A 
quality control inspector decides whether each bottle is filled to an acceptable 
level and passes inspection (P) or fails inspection (F). Determine the number 
of runs for each sequence and find the length of each run. 


1PPPPPPPPFFFFFFFF 
2PFPFPFPFPFPRFPFPF 
3. PPFFFFPFFFPPPPPP 


> Solution 


1. There are two runs. The first 8 P’s form a run of length 8 and the first 8 F’s 
form another run of length 8, as shown below. 


PPPPPPPP FFFFFFFF 
Se a eee 


Ist run 2nd run 
2. There are 16 runs each of length 1, as shown below. 


P F P F P F P F P F P F P F PF 
2 uA ba 


1st run 2nd run... ... 16th run 


3. There are 5 runs, the first of length 2, the second of length 4, the third of 
length 1, the fourth of length 3, and the fifth of length 6, as shown below. 


PP FFFF P FFF PPPPPP 


1Istrun 2ndrun 3rdrun 4thrun 5th run 
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> Try It Yourself 1 


A machine produces engine parts. An inspector measures the diameter of 
each engine part and determines if the part passes inspection (P) or fails 
inspection (F'). The results are shown below. Determine the number of runs in 
the sequence and find the length of each run. 


PPPFPFPPPPFFPFPPRFRFRFPPPRFPPP 


a. Separate the data each time there is a change in the characteristic of the data. 
b. Count the number of groups to determine the number of runs. 
c. Count the number of data within each run to determine the length. 

Answer: Page A48 


When each value in a set of data can be categorized into one of two separate 
categories, you can use the runs test for randomness to determine whether the 
data are random. 


DEFINITION 


The runs test for randomness is a nonparametric test that can be used to 
determine whether a sequence of sample data is random. 


The runs test for randomness considers the number of runs in a sequence of 
sample data in order to test whether a sequence is random. If a sequence has too 
few or too many runs, it is usually not random. For instance, the sequence 


PPPPPPPPFFFFFFFF 
from Example 1, Part 1, has too few runs (only 2 runs). The sequence 
PFPFPFPFPFPFPFPF 


from Example 1, Part 2, has too many runs (16 runs). So, these sample data are 
probably not random. 

You can use a hypothesis test to determine whether the number of runs in a 
sequence of sample data is too high or too low. The runs test is a two-tailed test, 
and the null and alternative hypotheses are as follows. 


Ho: The sequence of data is random. 
H,: The sequence of data is not random. 


When using the runs test, let n, represent the number of data that have one 
characteristic and let nz represent the number of data that have the second 
characteristic. It does not matter which characteristic you choose to be 
represented by n,. Let G represent the number of runs. 


n, = number of data with one characteristic 
n, = number of data with the other characteristic 
G = number of runs 


Table 12 in Appendix B lists the critical values for the runs test for selected 
values of n, and n at the a = 0.05 level of significance. (In this text, you will use 
only the a = 0.05 level of significance when performing runs tests.) If m, or n is 
greater than 20, you can use the standard normal distribution to find the critical 
values. 
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You can calculate the test statistic for the runs test as follows. 


TEST STATISTIC FOR THE RUNS TEST 


When n, = 20 and n, = 20, the test statistic for the runs test is G, the 
number of runs. 


When n, > 20 or nr > 20, the test statistic for the runs test is 


Gs hcg 
7 =e 
9G 
where 
2nyn 2nyn7(2nyn, — ny — n 
Hea 1) and oe = in(2ny ny 1 2) 
ze 2 
nm + Ng (my + Nz)"(ny, + ny — 1) 


GUIDELINES 


Performing a Runs Test for Randomness 
IN WORDS IN SYMBOLS 


1. Identify the claim. State the null State Hy and H,. 
and alternative hypotheses. 


2. Specify the level of significance. Identify a. 
(Use a = 0.05 for the runs test.) 


3. Determine the number of data that Determine 7, m2, and G. 
have each characteristic and the 
number of runs. 


4. Determine the critical values. If n; = 20 and ny = 20, use 
Table 12 in Appendix B. 


If n; > 20 or ny > 20, use 


Table 4 in Appendix B. 
5. Find the test statistic. If n; = 20 and n, = 20, use G. 
If n; > 20 or ny > 20, use 
SC aiic 
Se ee 
0G 
6. Make a decision to reject or fail If G is less than or equal to 
to reject the null hypothesis. the lower critical value or 


greater than or equal to the 
upper critical value, reject Hp. 
Otherwise, fail to reject Hp. 


Or, if z is in the rejection region, 
reject Hp. Otherwise, fail to 
reject Hp. 


7. Interpret the decision in the 
context of the original claim. 
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EXAMPLE 2 


» Using the Runs Test 


As people enter a concert, an usher records where they are sitting. The results 
for 13 people are shown, where L represents a lawn seat and P represents 
a pavilion seat. At a = 0.05, can you conclude that the sequence of seat 
locations is not random? 


ELELPPLPPPLLPL 


> Solution 
The claim is “the sequence of seat locations is not random.” To test this claim, 
use the following null and alternative hypotheses. 

Hp: The sequence of seat locations is random. 

H,: The sequence of seat locations is not random. (Claim) 


To find the critical values, first determine ,, the number of L’s; n2, the number 
of P’s; and G, the number of runs. 


LEbE PP PP PoP) dale oP 
ee a 


Ist 2nd 3rd 4th Sth 6th 7th 
run run run run run run run 


number of L’s = 7 


ny 
ny = number of P’s = 6 
G = number of runs = 7 


Because n, = 20, n. = 20, and a = 0.05, use Table 12 to find the lower critical 
value 3 and the upper critical value 12. The test statistic is the number of runs 
G = 7. Because the test statistic G is between the critical values 3 and 12, you 
should fail to reject the null hypothesis. 


Interpretation There is not enough evidence at the 5% level of significance 
to support the claim that the sequence of seat locations is not random. So, it 
appears that the sequence of seat locations is random. 


> Try It Yourself 2 


The genders of 15 students as they enter a classroom are shown below, where 
F represents a female and M represents a male. At a = 0.05, can you conclude 
that the sequence of genders is not random? 


MFFFMMFFMFMMFFF 


. Identify the claim and state Hp and H,. 

. Specify the level of significance a. 

. Determine 14, m9, and G. 

. Determine the critical values. 

. Find the fest statistic G. 

Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A48 


rT meoan & & 
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EXAMPLE 3 


» Using the Runs Test 
You want to determine whether the selection of recently hired employees 
in a large company is random with respect to gender. The genders of 
36 recently hired employees are shown below, where F represents a female and 
M represents a male. At a = 0.05, can you conclude that the sequence of 
employees is not random? 


MMFFFFMMMMMM 
FFFFFMMM©MMM©MMM 
FFFMMMM©MFMMFM 


> Solution 
The claim is “the sequence of employees is not random.” To test this claim, 
use the following null and alternative hypotheses. 

Hy: The sequence of employees is random. 

H,: The sequence of employees is not random. (Claim) 


To find the critical values, first determine n,, the number of F’s; no, the 
number of M’s; and G, the number of runs. 


MM FFFF MMMMMM 
ee \ — x Ke \7 = 


1Istrun 2ndrun 3rd run 


FFFFF MMMMMMM 
3 of 


=e 
4th run 5th run 


FFF MMMM F MM F M 
et hy Le bs 
6th run 7th run 8th oth 10th 11th 

run run run-~ run 


n, = number of F’s = 14 
ny = number of M’s = 22 
G = number of runs = 11 


Because nz > 20, use Table 4 in Appendix B to find the critical values. Because 
the test is a two-tailed test with a = 0.05, the critical values are 


—Z) = —1.96 
and 
Zo = 1.96. 
Before calculating the test statistic, find the values of wg and ag, as follows. 


_ 2nyNn2 aa 
pe ny + No 
2(14)(22) 


14 + 22 


2 

me 
oe) 
rar 
— 
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The table shows the National 
Football League conference 

of each winning team from 
Super Bowl | to Super Bowl XLIV, 


where A represents the American 


Football Conference and N 
represents the National Football 
Conference. (Source: National Football 
League) 


N 


1967 1989 N 
1968 1990 
1969 1991 
1970 1992 
1971 1993 
1972 1994 
1973 1995 
1974 1996 
1975 1997 
1976 1998 
1977 1999 
1978 2000 
1979 2001 
1980 2002 
1981 2003 
1982 2004 
1983 2005 
1984 2006 
1985 2007 
1986 2008 
1987 2009 
1988 2010 


N 
A 
A 
A 
N 
A 
A 
A 
A 
A 
N 
A 
A 
A 
N 
N 
A 
N 
N 
N 
N 


Z2eezrexeexsezeaexnzanszn_zzzzz2222 


At a = 0.05, can you conclude 
that the sequence of conferences 
of Super Bowl winning teams 

is not random? 
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Bue — Ny — Ny) 
6 Vm + mP(m + m2 — 1) 
foean@ =44= 09) 
7 (14 + 22)2(14 + 22 — 1) 
~ 2.81 


You can find the test statistic as follows. 
_ G> He 
goo 
OG 


_ 11 = 18.11 
2.81 


x —2.53 


From the graph below, you can see that the test statistic z is in the rejection 
region. So, you should decide to reject the null hypothesis. 


t t t 
-3 \? -- 0 1 /f2 3 
z=-2.53 -z=-1.96 7 =1.96 


Interpretation ‘You have enough evidence at the 5% level of significance 
to support the claim that the sequence of employees with respect to gender is 
not random. 


> Try It Yourself 3 


Let S represent a day in a small town in which it snowed and let N represent a 
day in the same town in which it did not snow. The following are the snowfall 
results for the entire month of January. At a = 0.05, can you conclude that the 
sequence is not random? 


NNNSSNNSNSNNNNNS 
NSNSNNSNSSNNNNN 


. Identify the claim and state Hp and H,. 

. Specify the level of significance a. 

. Determine 1, m9, and G. 

. Determine the critical values. 

. Find the fest statistic z. 

Decide whether to reject the null hypothesis. 

. Interpret the decision in the context of the original claim. 

Answer: Page A48 


memonn & & 


When n, or np is greater than 20, you can also use a P-value to perform a 
hypothesis test for the randomness of the data. In Example 3, you can calculate 
the P-value to be 0.0114. Because P < a, you should reject the null hypothesis. 
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M@ BUILDING BASIC SKILLS AND VOCABULARY 


1. In your own words, explain why the hypothesis test discussed in this section 
is called the runs test. 


2. Describe the test statistic for the runs test when the sample sizes n, and ny 
are less than or equal to 20 and when either n, or 2 is greater than 20. 


HM USING AND INTERPRETING CONCEPTS 


Finding the Number of Runs = /n Exercises 3-6, determine the number of 
runs in the given sequence. Then find the length of each run. 


3,.TFTFTTTFFFTF 
4 UUDDUDUUDDUDUU 
|— 5 MFMFMFFFFPEFRFMMMFFMMMM 


~6AAABBBABBAAAAAABAABABB 


7. Find the values of n, and n, in Exercise 3. 
8. Find the values of n, and n, in Exercise 4. 
9. Find the values of n, and n, in Exercise 5. 


10. Find the values of n; and n, in Exercise 6. 


Finding Critical Values In Exercises 11-14, use the given sequence and Table 12 
in Appendix B to determine the number of runs that are considered too high and the 
number of runs that are considered too low for the data to be in random order. 


11.67TFTFTFTFTFTF 
12.,.$MFMMMM©MM©MM©MFFMM 
13.NSSSNNNNNSNSNSSNNWN 

my wax xxxxxXxXXYYYYYYYYYYYYYY 


Performing a Runs Test Jn Exercises 15-20, use the runs test to (a) identify 
the claim and state Hg and H,, (b) determine the critical values using Table 4 or 
Table 12 in Appendix B, (c) find the test statistic, (d) decide whether to reject or 
fail to reject the null hypothesis, and (e) interpret the decision in the context of the 
original claim. Use a = 0.05. 


15. Coin Toss A coach records the results of the coin toss at the beginning of 
each football game for a season. The results are shown, where H represents 
heads and T represents tails. The coach claimed the tosses were not random. 
Use the runs test to test the coach’s claim. 


HTTTHTHHTTTTAHATHHA 


* 16. Senate The sequence shows the majority party of the U.S. Senate after 
each election for a recent group of years, where R represents the 
Republican party and D represents the Democratic party. Can you 
conclude that the sequence is not random? (Source: United States Senate) 


RDDDRRRRRRRDDDDDODD 
RDDRDD 
RRRDDD 
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s 
4 


17. 


18. 


19. 


20. 


Baseball The sequence shows the Major League Baseball league of 
each World Series winning team from 1969 to 2009, where N represents 
the National League and A represents the American League. Can you 
conclude that the sequence of leagues of World Series winning teams is 
not random? (Source: Major League Baseball) 


NANAAANNAANNNNAAANAN 
ANAAANANAAANANAANANA 


Number Generator A number generator outputs the sequence of digits 
shown, where O represents an odd digit and F represents an even digit. 
Test the claim that the digits were not randomly generated. 


OOOQOEEEEOOQOOOOEEEE 
OOEEEEQOOQOOQOQEEEEOO 


Dog Identifications A team of veterinarians record, in order, the 
genders of every dog that is microchipped at their pet hospital in one 
month. The genders of recently microchipped dogs are shown, where F 
represents a female and M represents a male. A veterinarian claims that 
the microchips are random by gender. Do you have enough evidence to 
reject the doctor’s claim? 


MMFMFFFFFMMMFFF 
MFFFFFMFFFMFFF 


Golf Tournament A golf tournament official records whether each 
past winner is American-born (A) or foreign-born (F). The results are 
shown for every year the tournament has existed. Can you conclude that 
the sequence is not random? 


FFAFFAFFAFFAFFAFFAFFFFFF 
AFFAFFAFFAFFAFAFFAFFFFFA 
FFFFFAFFFA 


mM EXTENDING CONCEPTS 


Runs Test with Quantitative Data Jn Exercises 21-23, use the following 
information to perform a runs test. You can also use the runs test for randomness 
with quantitative data. First, calculate the median. Then assign + to those values 
above the median and — to those values below the median. Ignore any values that 
are equal to the median. Use a = 0.05. 


s 
4 


21. 


22. 


Daily High Temperatures The sequence shows the daily high 
temperatures (in degrees Fahrenheit) for a city during the month of 
July. Test the claim that the daily high temperatures do not occur 
randomly. 


84 87 92 93 95 84 82 83 81 87 92 98 99 93 84 85 
86 92 91 95 84 92 83 81 87 92 98 89 93 84 85 


Exam Scores The sequence shows the exam scores of a class based on 
the order in which the students finished the test. Test the claim that the 
scores occur randomly. 


83 94 80 76 92 89 65 75 82 87 90 91 81 99 97 72 
72 89 90 92 87 76 74 66 88 81 90 92 89 76 80 


23. Use a technology tool to generate a sequence of 30 numbers from 1 to 99, 
inclusive. Test the claim that the sequence of numbers is not random. 
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USES AND ABUSES 


Uses 


Nonparametric Tests Before you could perform many of the hypothesis 
tests you learned about in previous chapters, you had to ensure that certain 
conditions about the population were satisfied. For instance, before you could 
run a ¢-test, you had to verify that the population was normally distributed. 
One advantage of the nonparametric tests shown in this chapter is that they 
are distribution free. That is, they do not require any particular information 
about the population or populations being tested. Another advantage of 
nonparametric tests is that they are easier to perform than their parametric 
counterparts. This means that they are easier to understand and quicker to use. 
Nonparametric tests can often be used when data are at the nominal or 
ordinal level. 


Abuses 


Insufficient Evidence Stronger evidence is needed to reject a null hypothesis 
in a nonparametric test than in a corresponding parametric test. That is, when 
you are trying to support a claim represented by the alternative hypothesis, 
you might need a larger sample when performing a nonparametric test. If the 
outcome of a nonparametric test results in failure to reject the null hypothesis, 
you should investigate the sample size used. It may be that a larger sample will 
produce different results. 


Using an Inappropriate Test In general, when information about the 
population (such as the condition of normality) is known, it is more efficient 
to use a parametric test. However, if information about the population is not 
known, nonparametric tests can be helpful. 


M@ EXERCISES 


1. Insufficient Evidence Give an example of a nonparametric test in which 
there is not enough evidence to reject the null hypothesis. 


2. Using an Inappropriate Test Discuss the nonparametric tests described 
in this chapter and match each test with its parametric counterpart, which 
you studied in earlier chapters. 
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i) CHAPTER SUMMARY 


REVIEW 


What did you learn? EXAMPLE(S) | EXERCISES 


Section 11.1 


= How to use the sign test to test a population median 1,2 1-3, 6 
(x + 0.5) — 0.5n 


2 


= How to use the paired-sample sign test to test the difference between two 3 4,5 
population medians (dependent samples) 


Section 11.2 


= How to use the Wilcoxon signed-rank test and the Wilcoxon rank sum test 1,2 7,8 
to test the difference between two population distributions 
R- pp n(n, + ny + 1) = + m +1) 
> FTR— 


a oye PR ps ig 


Section 11.3 


= How to use the Kruskal-Wallis test to determine whether three or more 1 9, 10 
samples were selected from populations having the same distribution 

Ri RS Ry 

ee ( ee 

N (N + 1) ny Ny Nk 


Section 11.4 

= How to use the Spearman rank correlation coefficient to determine 1 11, 12 
whether the correlation between two variables is significant 
6d? 

n(n? — 1) 


re=1- 


Section 11.5 


= How to use the runs test to determine whether a data set is random 1-3 13,14 


GC 2nin 2nynx(2nyn, — ny — n 
G = number of runs, z = es MG = — 1, og =f la = : 
G (ny + No) (my + Ny — 1) 


The table summarizes parametric and nonparametric tests. Always use 
the parametric test if the conditions for that test are satisfied. 


One-sample tests z-test for a population mean Sign test for a population median 
t-test for a population mean 
Two-sample tests 
Dependent samples t-test for the difference between means _ Paired-sample sign test 
Wilcoxon signed-rank test 
Independent samples z-test for the difference between means | Wilcoxon rank sum test 
t-test for the difference between means 
Tests involving three or more samples | One-way ANOVA Kruskal-Wallis test 
Correlation Pearson correlation coefficient Spearman rank correlation coefficient 
Randomness (No parametric test) Runs test 
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DD REVIEW EXERCISES 


M@ SECTION 11.1 


In Exercises 1-6, use a sign test to test the claim by doing the following. 


(a) Identify the claim and state Hp and H,,. 

(b) Determine the critical value. 

(c) Find the test statistic. 

(d) Decide whether to reject or fail to reject the null hypothesis. 
(e) Interpret the decision in the context of the original claim. 


1. A bank manager claims that the median number of customers per day is 
no more than 650. The number of bank customers per day for 17 randomly 
selected days are listed below. At a = 0.01, can you reject the bank 
managet’s claim? 


675 665 601 642 554 653 639 650 645 
550 677 569 650 660 682 689 590 


2. A company claims that the median credit score for U.S. adults is at least 710. 
The credit scores for 13 randomly selected U.S. adults are listed below. 
At a = 0.05, can you reject the company’s claim? (Adapted from Fair Isaac 
Corporation) 


750 782 805 695 700 706 625 
589 690 772 745 704 710 


3. A government agency estimates that the median sentence length for all 
federal prisoners is 2 years. In a random sample of 180 federal prisoners, 
65 have sentence lengths that are less than 2 years, 109 have sentence lengths 
that are more than 2 years, and 6 have sentence lengths that are 2 years. 
At a = 0.10, can you reject the agency’s claim? (Adapted from United States 
Sentencing Commission) 


*, 4. In a study testing the effects of calcium supplements on blood pressure in 

men, 10 randomly selected men were given a calcium supplement for 
12 weeks. The following measurements are for each subject’s diastolic blood 
pressure taken before and after the 12-week treatment period. At a = 0.05, 
can you reject the claim that there was no reduction in diastolic blood 
pressure? (Adapted from the American Medical Association) 


9 10 
136 | 102 
125 104 
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‘". 5. In a study testing the effects of an herbal supplement on blood pressure 
in men, 11 randomly selected men were given an herbal supplement for 
12 weeks. The following measurements are for each subject’s diastolic 
blood pressure taken before and after the 12-week treatment period. 
At a = 0.05, can you reject the claim that there was no reduction in 
diastolic blood pressure? (Adapted from The Journal of the American 
Medical Association) 


109 «112 «102 «298s 114119 


6. An association claims that the median annual salary of lawyers 9 months after 
graduation from law school is $68,500. In a random sample of 125 lawyers 
9 months after graduation from law school, 76 were paid less than $68,500, and 
49 were paid more than $68,500. At a = 0.05, can you reject the association’s 
claim? (Adapted from National Association of Law Placement) 


M@ SECTION 11.2 


In Exercises 7 and 8, use a Wilcoxon test to test the claim by doing the following. 

(a) Decide whether the samples are dependent or independent; then choose the 
appropriate Wilcoxon test. 

(b) Identify the claim and state Hj) and H,. 

(c) Determine the critical values. 

(d) Find the test statistic. 

(e) Decide whether to reject or fail to reject the null hypothesis. 


(f) Interpret the decision in the context of the original claim. 


‘7. A career placement advisor estimates that there is a difference in the 
total times required to earn a doctorate degree by female and male 
graduate students. The table shows the total times to earn a doctorate for 
a random sample of 12 female and 12 male graduate students. At 
a = 0.01, can you support the advisor’s claim? (Adapted from National 
Opinion Research Council) 


Female 13 12 10 13 12 9 11° 14) 7°>7)' 9° 10 
Male 11 8 = 69 11/10 8 8 10; 11°9 ~ 10 8 
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8. A medical researcher claims that a new drug affects the number of headache 
hours experienced by headache sufferers. The number of headache hours 
(per day) experienced by eight randomly selected patients before and after 
taking the drug are shown in the table. At a = 0.05, can you support the 
researcher’s claim? 


1)/2)/3)/4/s5];6 ]7] 8 


09 | 23 | 27 | 24 | 29 19 | 42 | 34 


14 | 15 | 14) 18 | 13. | 06 | 0.7 | 1.9 


M@ SECTION 11.3 


In Exercises 9 and 10, use the Kruskal-Wallis test to test the claim by doing the 
following. 


(a) Identify the claim and state Hp and H,. 
(b) Determine the critical value. 
(c) Find the sums of the ranks for each sample and calculate the test statistic. 


(d) Decide whether to reject or fail to reject the null hypothesis. 


(e) Interpret the decision in the context of the original claim. 

‘ 9. The table shows the ages for a random sample of doctorate recipients 
in three fields of study. At a = 0.01, can you conclude that the 
distributions of the ages of the doctorate recipients in these three fields 
of study are different? (Adapted from Survey of Earned Doctorates) 


Life sciences 31 | 32 | 34 | 31 | 30 | 32 | 35 | 31 | 32 | 34 | 29 
Physical sciences 30 31 32 31 30) 29 31 30 32 33 30 
Social sciences 32 | 35 | 31 | 33 | 34 | 31 | 35 | 36 32-30 | 33 


‘,, 10. The table shows the starting salary offers for a random sample of 
college graduates in four fields of engineering. At a = 0.05, can you 
conclude that the distributions of the starting salaries in these four fields 
of engineering are different? (Adapted from National Association of Colleges 
and Employers) 


Chemical | 66.4 | 63.9 | 69.7 | 685 | 62.3 | 67.9 | 655 | 63.7 | 67.4 | 691 
engineering 
Computer | 611 | 605 | 587 | 593 | 624 | 65.5 | 599 | 631 | 61.4 | 593 
engineering 
Bier encal 593 | 57.9 | 58.5 | 56.8 | 60.0 | 59.7 | 613 | 60.5 | 59.5 | 59.8 
engineering 
Mechanical | 599 | seo | 59.0 | 57.1 | 59.0 | 587 | 61.5 | 62.0 | 583 | 561 
engineering 
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M@ SECTION 11.4 


In Exercises 11 and 12, use the Spearman rank correlation coefficient to test the 
claim by doing the following. 


(a) Identify the claim mathematically and state Hy) and H,,. 
(b) Determine the critical value using Table 10 in Appendix B. 
(c) Find the test statistic. 


(d) Decide whether to reject the null hypothesis. 
(e) Interpret the decision in the context of the original claim. 


11. The table shows the overall scores and the prices for seven randomly 
selected Blu-ray™ players. The overall score is based mainly on picture 
quality. At a = 0.05, can you conclude that there is a correlation between 
overall score and price? (Source: Consumer Reports) 


OR 3 fo | ls | | os 


500 300 , 500) 150) 250.) 200 -~——:130 


12. The table shows the overall scores and the prices per gallon for nine 
randomly selected interior paints. The overall score represents hiding, surface 
smoothness, and resistance to staining, scrubbing, gloss change, sticking, 
mildew, and fading. At a = 0.10, can you conclude that there is a correlation 
between overall score and price? (Adapted from Consumer Reports) 


M@ SECTION 11.5 


In Exercises 13 and 14, use the runs test to (a) identify the claim and state Hp 
and H,, (b) determine the critical values using Table 4 or Table 12 in Appendix B, 
(c) calculate the test statistic, (d) decide whether to reject or fail to reject the null 
hypothesis, and (e) interpret the decision in the context of the original claim. 
Use a = 0.05. 


* 13. A highway patrol officer stops speeding vehicles on an interstate 
highway. The following shows the genders of the last 25 drivers who were 
stopped, where F represents a female driver and M represents a male 
driver. Can you conclude that the stops were not random by gender? 


FMMM*FMFMFFFMM 
FFFMMMFMMFFM 


14. The following data represent the departure status of the last 18 buses to 
leave a bus station, where T represents a bus that departed on time and 
L represents a bus that departed late. Can you conclude that the departure 
status of the buses is not random? 


TTTTLLLLT 
LELLTTTrrTrirTT 
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CHAPTER QUIZ 645 


Take this quiz as you would take a quiz in class. After you are done, check your 
work against the answers given in the back of the book. 


For each exercise, (a) identify the claim and state Hj) and H,, (b) decide which test 
to use, (c) determine the critical value(s), (d) find the test statistic, (e) decide whether 
to reject or fail to reject the null hypothesis, and (f) interpret the decision in the 
context of the original claim. 


Y 


© 1. A labor organization claims that there is a difference in the hourly 


earnings of union and nonunion workers in state and local governments. 
A random sample of 10 union and 10 nonunion workers in state and local 
governments and their hourly earnings are listed in the tables. At 
a = 0.10, can you support the organization’s claim? (Adapted from U.S. 
Bureau of Labor Statistics) 


27.20 25.60 29.75 32.97 30.33 24.80 21.75 19.85 25.60 20.70 
25.30 24.80 26.50 25.05 24.20 23.40 21.15 20.90 20.05 19.10 


2. An organization claims that the median number of annual volunteer hours is 52. 
In a random sample of 75 people who volunteered last year, 47 volunteered for 
less than 52 hours, 23 volunteered for more than 52 hours, and 5 volunteered for 
52 hours. At a = 0.05, can you reject the organization’s claim? (Adapted from 
U.S. Bureau of Labor Statistics) 


dj 


e 


"3. The table shows the sales prices for a random sample of apartment 


condominiums and cooperatives in four U.S. regions. At a = 0.01, can you 
conclude that the distributions of the sales prices in these four regions are 
different? (Adapted from National Association of Realtors) 


Northeast § 252.5 245.5 | 237.9 | 270.2 | 265.9 250.0 | 259.4 238.6 
Midwest 188.9 | 205.1 | 200.9 175.9 170.5 191.9 | 185.3 | 187.1 
South 175.5 | 150.9 | 149.8 164.6 169.5 | 190.5 | 172.6 | 161.0 
West 218.5 | 201.9 | 255.7. 230.0 | 189.9 225.7 | 220.0 | 206.3 


. A meteorologist wants to determine whether days with rain occur 


randomly in April in his home town. To do so, the meteorologist records 
whether it rains for each day in April. The results are shown, where R 
represents a day with rain and N represents a day with no rain. At 
a = 0.05, can the meteorologist conclude that days with rain are not 
random? 


NRRNNNNRNRRNRRR 
NRRRRNNNNRNRNNR 


5. The table shows the number of larceny-thefts (per 100,000 population) and 
the number of motor vehicle thefts (per 100,000 population) in six randomly 
selected large US. cities. At a = 0.10, can you conclude that there is a 
correlation between the number of larceny-thefts and the number of motor 
vehicle thefts? (Source: U.S. Department of Justice) 


-Larceny-thefts 1403. 1506 | 2937 | 3449-2728 |» 3042 


161 608 659 897 774 945 
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> mg Real Statistics — Real Decisions 


In a recent year, according to the Bureau of Labor Statistics, the median 
number of years that wage and salary workers had been with their v S 


current employer (called employee tenure) was 4.1 years. Information Ss 5. 
on employee tenure has been gathered since 1996 using the Current ~ = 
Population Survey (CPS), a monthly survey of about 60,000 households oes y 
that provides information on employment, unemployment, earnings, S > ney 


demographics, and other characteristics of the U.S. population ages 16 
and over. With respect to employee tenure, the questions measure how www.bls.gov 
long workers have been with their current employers, not how long 
they plan to stay with their employers. 

Employee Tenure 


of 20 Workers 

4.6 2.6 3.3 
1. How Would You Do It? 2.8 15 1.9 
(a) What sampling technique would you use to select the sample for 4.0 5.0 3.9 
the CPS? 5:1 3.7 5.4 
(b) Do you think the technique in part (a) will give you a sample 3.6 3.9 6.2 
that is representative of the U.S. population? Why or why not? 1.7 46 3.1 

(c) Identify possible flaws or biases in the survey on the basis of the 4A 3.6 


technique you chose in part (a). 


2. Is There a Difference? 


A congressional representative claims that the median tenure for 
workers from the representative’s district is less than the national 


TABLE FOR EXERCISE 2 


; BpSeEs Employee Employee 
median tenure of 4.1 years. The claim is based on the representa- emma (eve femme aie 
tive’s data and is shown in the table at the right above. (Assume that a sample of a sample of 
the employees were randomly selected.) male workers female workers 
(a) Is it possible that the claim is true? What questions should you 3.9 44 

ask about how the data were collected? 44 49 
(b) How would you test the representative’s claim? Can you use a 47 54 

parametric test, or do you need to use a nonparametric test? 43 re 
(c) State the null hypothesis and the alternative hypothesis. 4s on 
(d) Test the claim using a = 0.05. What can you conclude? 38 18 

3. Comparing Male and Female Employee Tenures 3.6 5.1 
A congressional representative claims that there is a difference 47 5.1 
between the median tenures for male workers and female workers. 23 33 
The claim is based on the representative’s data and is shown in the 65 22 
table at the right. (Assume that the employees were randomly Ko 59 
selected from the representative’s district.) a ve 
(a) How would you test the representative’s claim? Can you use a is 

parametric test, or do you need to use a nonparametric test? ; 
4.0 


(b) State the null hypothesis and the alternative hypothesis. 
(c) Test the claim using a = 0.05. What can you conclude? TABLE FOR EXERCISE 3 
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TECHNOLOGY MINITAB TI-83/84 PLUS 


U.S. INCOME AND ECONOMIC RESEARCH 
The National Bureau of Economic Research (NBER) is a private, 
nonprofit, nonpartisan research organization. The NBER provides | South — West 


information for better understanding of how the U.S. economy 47,000 30,035 24,030 41,180 
works. Researchers at the NBER concentrate on four types of 35,145 32,235 37,943 | 35,298 
empirical research: developing new statistical measurements, 31.497 31.010 | 36.280 29.114 


estimating quantitative models of economic behavior, assessing 
the effects of public policies on the U.S. economy, and projecting 
the effects of alternative policy proposals. 


27,500 37,660 38,738 | 36,180 
28,500 36,224 22,275 | 38,558 


One of the NBER’s interests is the median income of people in 35,400 35,510 | 27,975 | 27,680 
different regions of the United States. The table at the right shows 33,810 34,535 28,275 | 33,080 
the annual incomes (in dollars) of a random sample of people 32,500 46,035 | 35,073 44,930 
(15 years and over) in a recent year in four U.S. regions: Northeast, 29,950 23,331 39,730 29,408 


Midwest, South, and West. 25,100 | 44,213 | 36,775 | 26,180 


42,700 29,405 25,675 | 32,956 
49,950 31,695 29,875 | 40,744 


M@ EXERCISES 


In Exercises 1-5, refer to the annual income of 6. Repeat Exercises 1,3, 4, and 5 using the data in the 
people in the table. Use a = 0.05 for all tests. following table. The table shows the annual 
incomes (in dollars) of a random sample of families 
in a recent year in four US. regions: Northeast, 
Midwest, South, and West. 


1. Construct a box-and-whisker plot for each 
region. Do the median annual incomes appear to 
differ between regions? 


2. Use a technology tool to perform a sign test to 
test the claim that the median annual income in 
the Midwest is greater than $30,000. 


3. Use a technology tool to perform a Wilcoxon 
rank sum test to test the claim that the median 


58,010 | 57,680 55,200 61,808 


annual incomes in the Northeast and South are 107,465 83,260 57,787 | 70,125 
the same. 75,800 39,060 49,400 51,982 

4. Use a technology tool to perform a Kruskal-Wallis 106,770 55,260 | 90,200 67,330 
test to test the claim that the distributions of annual 65,780 72,216 | 55,209 53,830 
incomes for all four regions are the same. 51,500 52,048 35,200 79,220 

5. Use a technology tool to perform a one-way 46,366 67,760 60,300 61,108 
ANOVA to test the claim that the average annual 66,750 61,860 | 38,756 86,130 
86,955 


incomes for all four regions are the same. Assume 48,800 64,920 | 64,621 
that the populations of incomes are normally | ogni | aaa 
distributed, the samples are independent, and the | ~-? : 
population variances are equal. How do your 44,795 62,260 | 77,650 66,650 
results compare with those in Exercise 4? 65,650 59,596 | 51,085 | 47,910 
58,500 70,510 59,200 62,364 
72,800 61,460 45,100 66,880 


57,799 77,960 | 55,562 | 46,923 


73,520 57,260 


Extended solutions are given in the Technology Supplement. 
Technical instruction is provided for MINITAB, Excel, and the TI-83/84 Plus. 
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Men, x 
10.80 
10.38 
10.30 
10.30 
10.79 
10.62 
10.32 
10.06 

9.95 
10.14 
10.06 
10.25 

9.99 

9.92 

9.96 

9.84 

9.87 

9.85 

9.69 


CHAPTER 11 


Women, y 


12.20 
11.90 
11.50 
12.20 
11.67 
11.82 
11.18 
11.49 
11.08 
11.07 
11.08 
11.06 
10.97 
10.54 
10.82 
10.94 
10.75 
10.93 
10.78 


TABLE FOR EXERCISE 1 
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CUNULATIVE REVIEW 
a ee eee 


Chapters 9 - 11 


* 1. The table at the left shows the winning times (in seconds) for the men’s 
and women’s 100-meter runs in the Summer Olympics from 1928 to 2008. 


(a) Display the data in a scatter plot, calculate the correlation coefficient 
r, and make a conclusion about the type of correlation. 


(b) Test the level of significance of the correlation coefficient r found in 
part (a). Use a = 0.05. 


(c) Find the equation of the regression line for the data. Draw the 
regression line on the scatter plot. 


(d) Use the regression line to predict the women’s 100-meter time when 
the men’s 100-meter time is 9.90 seconds. 


2. An employment agency claims that there is a difference in the weekly earnings 
of workers who are union members and workers who are not union members. 
A random sample of nine union members and eight nonunion members and 
their weekly CaURINES (in dollars) are shown in the table. At a = 0.05, can you 
support the agency’s claim? 


Union member 855 | 994 692 800 884 991 1040 904 | 930 
Notaunionmember 758 691 862 557 655 | 814 803 | 638 


* 3. An investment company claims that the median age of people with mutual 
funds is 50 years. The ages (in years) of 20 randomly selected mutual fund 
owners are listed below. At a = 0.01, is there enough EXEL MES to reject 
the company’s claim? 


46 34 33 27 58 64 54 36 38 42 
26 51 49 44 46 50 39 34 51 63 


*, 4, The table at the right shows the 


; : : Northeast Midwest South West 
residential natural gas expenditures 


(in dollars) in one year for a random 1478 393 434 625 
sample of households in four regions 649 980 319 | 538 
of the United States. Assume that 834 609 694 | 1045 
the populations are normally 1173 1157 678 497 


distributed and the population 


: 1013 865 305 
variances are equal. At a = 0.10, can ee 
you reject the claim that the mean 1565 peal || oan 
expenditures are equal for all four 655 870 451 | 349 


regions? (Adap 648 810 1021 633 


5. The equation used to predict sweet potato yield (in pounds per acre) is 
y = 11,182 + 174.53x, — 104.41x, where x, is the number of acres planted 
(in thousands) and x, is the number of acres harvested (in thousands). Use the 
multiple regression equation to pice = y- values for the eed values of line 
independent variables. rf Ag 


(a) x, = 91, x2 = 88 (b) x, = 110, x. = 98 
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None 5% 8. 

FIGURE FOR EXERCISE 8 
9. 
10. 


A school administrator reports that the standard deviations of reading 
test scores for eighth grade students are the same in Colorado and Utah. 
A random sample of 16 test scores from Colorado has a standard deviation of 
34.6 points and a random sample of 15 test scores from Utah has a standard 
deviation of 33.2 points. At a = 0.10, can you reject the administrator’s claim? 
Assume the samples are independent and each population has a normal 
distribution. (Adapted from National Center for Education Statistics) 


~ 7. An employment agency representative wants to determine whether there is a 


difference in the annual household incomes in four regions of the United States. 
To do so, the representative randomly selects several households in each region 
and records the annual household income for each in the table. At a = 0.01, 
can the representative conclude that the distributions of the annual household 
incomes in these regions are different? (Adapted from U.S. Census Bureau) 


Northeast 54.3 47.1 55.7 54.8 50.0 52.5 51.6 
Midwest 49.3 54.4 45.2 485 50.7 51.8 52.0 
South 44.4 45.6 49.2 41.5 46.4 49.2 47.0 
West 56.8 | 54.7 | 51.4 | 53.5 | 52.4 | 54.0 | 55.9 


| 
Results from a previous survey asking US. 
parents how much they intend to contribute 
to the college costs of their children are shown 
in the pie chart. To determine whether this None 31 
distribution is still the same, you randomly 


select 900 U.S. parents and ask them how nae ad 
much they intend to contribute to the college Half 277 
costs of their children. The results are shown in Most 305 
the table. At a = 0.05, are the distributions All 123 


different? (Adapted from Sallie Mae, Inc.) 


The table shows the metacarpal bone lengths (in centimeters) and the 
heights (in centimeters) of nine adults. The equation of the regression 
line is y = 1.700x + 94.428. (Adapted from the American Journal of 
Physical Anthropology) 


45 51 39 41 48 49 | 46 43 47 
| : 
171 178) 157, 163 | 172, 183s: 1173 | 175.) 173 


(a) Find the coefficient of determination and interpret the results. 
(b) Find the standard error of estimate s, and interpret the results. 


(c) Construct a 95% prediction interval for the height of an adult when his 
or her metacarpal bone length is 50 centimeters. Interpret the results. 


The table shows the overall scores and the prices of eight all-season tires. The 
overall score represents safety-related tests, such as braking, handling, and 
resistance to hydroplaning. At a = 0.10, can you conclude that there is a 
correlation between the overall score and the price? Use the Spearman rank 
correlation coefficient. (Source: Consumer Reports) 


74 82 78 84 80 64 70 | 74 


77 96 77 | 116 98 | 67 | 70 | 81 
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APPENDIX 


In this appendix, we use a 0-to-z table as an alternative development of the standard 
normal distribution. It is intended that this appendix be used after completion of the 
“Properties of a Normal Distribution” subsection of Section 5.1 in the text. If used, 
this appendix should replace the material in the “Standard Normal Distribution” 
subsection of Section 5.1 except for the exercises. 


Standard Normal Distribution (0-to-z) 


r4 .00 01 .02 .03 .04 .05 .06 .07 .08 .09 
0.0 .0000 .0040 .0080 .0120 .0160 0199 =.0239 =.0279-— 0319 = .0359 
0.1 .0398 0438 = .0478 0517 0557 0596 .0636 =.0675 0714 = .0753 
0.2 .0793 .0832 0871 .0910 .0948 .0987 1026 =.1064 ~—.1103 1141 
0.3 sll Z8) cll2U7 25S 1293 oll 333) .1368 1406 =—.1443 1480 1517 
0.4 1554 = 1591 1628 .1664 = .1700 1736 1772 .1808 1844 = .1879 
0.5 ENS IOS OMS 2019 .2054 .2088 .2123 ANS 2190 = .2224 
0.6 2257 2291 2324 = 2357 .2389 2422 2454 2486 .2517 .2549 
0.7 .2580 2611 .2642 .2673 2704 = 2734 27642794 2823 2852 
0.8 .2881 .2910 2939 .2967 2995 3023 3051 3078 3106 = .3133 
0.9 SSE) SG eV) 3238 8 8=6.3264.— 3289 == 3315 3340 = .3365 3389 
1.0 3413 3438 3461 3485 3508 3531 3554 = 3577 3599 3621 
1.1 3643 3665 calorie) nS VAOKS S/N) ANS) SATO) BAST) SII) 815K) 
1.2 3849 3869 3888 3907 3925 3944 = 3962 3980 3997 4015 
1.3 4032 4049 4066 .4082 4099-4115 4131 4147 4162 A177 
1.4 4192 4207 4222 4236 ~— 4251 4265 4279 = 4292 4306 ~=-.4319 
15 4332 4345 4357 4370 ~— 4382 4394 4406 4418 4429 4441 
1.6 4452 4463 4474 4484 4495 4505 4515 4525 4535 4545 
1.7 4554 4564 ~~ 4573 4582 4591 4599 ~—.4608 4616  .4625 4633 
1.8 4641 4649 4656 4664 ~~ .4671 4678 4686 .4693 4699 4706 
1.9 4713 4719 4726 ~—4732 4738 4744 4750 4756 ~~ «4761 4767 
2.0 4772 4778 4783 4788 4793 4798 4803 4808 4812 4817 
2.1 4821 4826 4830 4834 4838 .4842 4846 4850 4854 4857 
2.2 4861 4864 .4868 4871 4875 4878 4881 4884  .4887 4890 
2.3 4893 4896 4898 4901 4904 4906 4909 4911 4913 4916 
2.4 4918 4920 4922 4925 4927 4929 = .4931 4932 4934 4936 
2.5 4938 4940 4941 4943 4945 4946 .4948 4949 ~=—.4951 4952 
2.6 4953 4955 4956 ~——.4957 4959 4960 4961 4962 4963 4964 
2.7 4965 4966 .4967 4968 4969 4970  .4971 4972 4973 4974 
2.8 4974 ~~ 4975 4976 4977 4977 4978 4979 4979 4980 4981 
2.9 4981 4982 4982 4983 4984 4984 4985 4985 4986 .4986 
3.0 4987 4987 4987 4988 4988 4989 4989 4989 4990 4990 
3.1 4990 A99I 4991 4991 4992 4992 4992 4992 4993 4993 
3.2 4993 4993 4994 4994 4994 4994 4994 4995 4995 4995 
3.3 4995 4995 4995 4996 4996 4996 4996 4996 .4996 4997 
3.4 4997 4997 4997 4997 4997 4997 4997 4997 4997 4998 


Reprinted with permission of Gale Mosteller, executor of estate of Frederick Mosteller, 3830 13th Street 
North, Arlington, VA 22201 mosteller.g@ei.com. 
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A2 APPENDIX A 


WHAT YOU SHOULD LEARN 


>» How to find areas under the 
standard normal curve 


INSIGHT 


Because every normal 
distribution can be 
transformed to the 
standard normal 
distribution, you can 
use z-scores and the 
standard normal curve 
to find areas (and 
therefore probabilities) 
under any normal curve. 


STUDY TIP 


It is important that you know 
the difference between x and z. 
The random variable x is 
sometimes called a raw 
score and represents 
values in a nonstandard 
normal distribution, 
whereas Z represents 
values in the standard 
normal distribution. 
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Alternative Presentation of the 
Standard Normal Distribution 


The Standard Normal Distribution 


> THE STANDARD NORMAL DISTRIBUTION 


There are infinitely many normal distributions, each with its own mean and 
standard deviation. The normal distribution with a mean of 0 and a standard 
deviation of 1 is called the standard normal distribution. The horizontal scale of 
the graph of the standard normal distribution corresponds to z-scores. In Section 
2.5, you learned that a z-score is a measure of position that indicates the number 
of standard deviations a value lies from the mean. Recall that you can transform 
an x-value to a z-score using the formula 


Value — Mean xX — pb 


— Standard deviation o 


DEFINITION 


The standard normal distribution is a normal distribution with a mean of 0 and 
a standard deviation of 1. 


Standard Normal Distribution 


If each data value of a normally distributed random variable x is transformed 
into a z-score, the result will be the standard normal distribution. When this 
transformation takes place, the area that falls in the interval under the nonstandard 
normal curve is the same as that under the standard normal curve within the 
corresponding z-boundaries. 

In Section 2.4, you learned to use the Empirical Rule to approximate areas 
under a normal curve when values of the random variable x corresponded to 
—3, —2, —1, 0,1, 2, or 3 standard deviations from the mean. Now, you will learn 
to calculate areas corresponding to other x-values. After you transform an 
x-value to a z-score, you can use the Standard Normal Table (0-to-z) on page Al. 
The table lists the area under the standard normal curve between 0 and the given 
z-score. As you examine the table, notice the following. 


PROPERTIES OF THE STANDARD NORMAL 
DISTRIBUTION 


1. The distribution is symmetric about the mean (z = 0). 


2. The area under the standard normal curve to the left of z = 0 is 0.5 and the 
area to the right of z = 0 is 0.5. 


3. The area under the standard normal curve increases as the distance 
between 0 and z increases. 
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APPENDIX A 


STUDY TIP 


When the z-score is not in the 
table, use the entry closest to it. 

If the given z-score is exactly 
midway between two : 
z-scores, then use the t 
area midway between ’ 
the corresponding areas. 


Area = 0.3749 
& 
0 Ls 
Area = 0.0948 
0.240 - 
Area = 0.0948 
0 0.24 
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ALTERNATIVE PRESENTATION OF THE STANDARD NORMAL DISTRIBUTION 


A3 


At first glance, the table on page Al appears to give areas for positive 
z-scores only. However, because of the symmetry of the standard normal curve, 
the table also gives areas for negative z-scores (see Example 1). 


EXAMPLE 1 


» Using the Standard Normal Table (0-to-z) 


1. Find the area under the standard normal curve between z = 0 and z = 1.15. 


2. Find the z-scores that correspond to an area of 0.0948. 


> Solution 


1. Find the area that corresponds to z = 1.15 by finding 1.1 in the left column 
and then moving across the row to the column under 0.05. The number 
in that row and column is 0.3749. So, the area between z = 0 and z = 1.15 


is 0.3749. 

z .00 01 02 .03 .04 

0.0 0000 .0040 .0080 0120 0160 

0.1 0398 0438 0478 0517 0557 

0.2 0793 .0832 .0871 0910 .0948 

0.3 ie ee eee Pe, ESE 

0.9 3315 

1.0 3554 
3770 

1.2 3849 3869 3888 3907 3925 3944 3962 

1.3 4032. 4049 4066 4082 4099 4115 4131 

1.4 4192. 4207 ~=4222.—s«4236~—4251~—SA265~—«A279 


2. Find the z-scores that correspond to an area of 0.0948 by locating 0.0948 in 
the table. The values at the beginning of the corresponding row and at the 
top of the corresponding column give the z-score. For an area of 0.0948, the 


row value is 0.2 and the column value is 0.04. So, the z-scores are z = —0.24 
and z = 0.24. 
Zz .05 -06 
0.0 .0199 .0239 
0.1 .0596 .0636 
)832 910 948 .0987 .1026 
0.3 .1179 slZ 7 ol ZS .1293 1331 .1368 .1406 
0.4 .1554 .1591 -1628 -1664 1700 .1736 .1772 
0.5 1915 .1950 .1985 .2019 2054 .2088 12283 


> Try It Yourself 1 


1. Find the area under the standard normal curve between z = 0 and z = 2.19. 


Locate the given z-score and find the corresponding area in the Standard 
Normal Table (0-to-z) on page Al. 


2. Find the z-scores that correspond to an area of 0.4850. 


Locate the given area in the Standard Normal Table (0-to-z) on page Al 
and find the corresponding z-score. 
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Answer: Page A49 
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Use the following guidelines to find various types of areas under the standard 
normal curve. 


GUIDELINES 


Finding Areas Under the Standard Normal Curve 


1. Sketch the standard normal curve and shade the appropriate area under the curve. 


2. Use the Standard Normal Table (0-to-z) on page A1 to find the area that corresponds to the given z-score(s). 


3. Find the desired area by following the directions for each case shown. 


a. Area to the left of z 
1. When z < 0, subtract the area from 0.5. 


1. The area between z = 0 
2. Subtract to find the area ANd z= —1.23 is 0.3907. 


to the left of z = —1.23; 
0.5 — 0.3907 = 0.1093. 


=I,23 0 


b. Area to the right of z 
1. When z < 0, add 0.5 to the area. 


1. The area between z = 0 
and z = —1.23 is 0.3907. 2. Add to find the area 


to the right of z = —1.23; 
0.5 + 0.3907 = 0.8907. 


=) 23 0 


c. Area between two z-scores 


i. When the two z-scores have the same sign 
(both positive or both negative), subtract 
the smaller area from the larger area. 


1. The area between z = 0 
and z, = 1.23 is 0.3907. 2. The area between z = 0 
and z, = 2.5 is 0.4938. 


nq 


3. Subtract to find the area 


between z, = 1.23 and z, = 2.5; 
0.4938 — 0.3907 = 0.1031. 


Presented by: 


il. When z > 0, add 0.5 to the area. 


1. The area between z = 0 


2. Add to find the area and z = 1.23 is 0.3907. 
to the left of z = 1.23; 
0.5 + 0.3907 = 0.8907. 


0 1.23 


il. When z > 0, subtract the area from 0.5. 


1. The area between z = 0 

and z= 1.23 is 0.3907. 2. Subtract to find the area 
to the right of z = 1.23; 
0.5 — 0.3907 = 0.1093. 


ii. When the two z-scores have opposite signs 
(one negative and one positive), add the 
areas. 


1. The area between z= 0 
2. The area between z = 0 and z, = 1.23 is 0.3907. 
and z, =—0.5 is 0.1915. 


3. Add to find the area between 


2, = 1.23 and z, =—0.5; 
0.3907 + 0.1915 = 0.5822. 
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APPENDIX A ALTERNATIVE PRESENTATION OF THE STANDARD NORMAL DISTRIBUTION A5 


EXAMPLE 2 


» Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve to the left of z = —0.99. 

> Solution 

The area under the standard normal curve to the left of z = —0.99 is shown. 
Area = 0.3389 


INSIGHT 


Because the normal distribution 
is a continuous probability 
distribution, the area under the 
standard normal curve to the 
left of a z-score gives the 
probability that z is less 

than that z-score. For 

instance, in Example 2, 

the area to the left of 

Zz = —0.99 is 0.1611. So, 


Area = 0.5 — 0.3389 


=0,99 0 


From the Standard Normal Table (0-to-z), the area corresponding to 
= —0.99 is 0.3389. Because the area to the left of z = 0 is 0.5, the area to 

P(z < —0.99) = 0.1611, . : , 

a terion the left of z = —0.99 is 0.5 — 0.3389 = 0.1611. 


mobi thats + Ty 1 Yourset 2 
Find the area under the standard normal curve to the left of z = 2.13. 


a. Draw the standard normal curve and shade the area under the curve and to 
the left of z = 2.13. 

b. Use the Standard Normal Table (0-to-z) on page A1 to find the area that 
corresponds to z = 2.13. 

c. Add 0.5 to the resulting area. Answer: Page A49 


EXAMPLE 3 


» Finding Area Under the Standard Normal Curve 
Find the area under the standard normal curve to the right of z = 1.06. 


> Solution 
The area under the standard normal curve to the right of z = 1.06 is shown. 


Area = 0.3554 


\ Area = 0.5 - 0.3554 


From the Standard Normal Table (0-to-z), the area corresponding to z = 1.06 
is 0.3554. Because the area to the right of z = 0 is 0.5, the area to the right of 
Zz = 1.06 is 0.5 — 0.3554 = 0.1446. 
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A6 APPENDIX A 


> Try It Yourself 3 


Find the area under the standard normal curve to the right of z = —2.16. 
According to one publication, the a. Draw the standard normal curve and shade the area below the curve and to 
number of births in a recent year the right of z = —2.16. 
was 4,317,000. The weights of the b. Use the Standard Normal Table (0-to-z) on page A1 to find the area that 
newborns can be approximated corresponds to z = —2.16. 
by a normal distribution, as c. Add 0.5 to the resulting area. Answer: Page A49 


shown by the following graph. 
(Adapted from National Center for Health 
Statistics) 


Weights of Newborns EXAMPLE 4 


» Finding Area Under the Standard Normal Curve 


Find the area under the standard normal curve between z = —1.5 and 
z= 1.25. 
t + + t 
oS [=] Ss Ss . 
Ee & & > Solution 
Weight (in grams) The area under the standard normal curve between z = —1.5 and z = 1.25 
is shown. 


Find the z-scores that 
correspond to weights of 2000, 
3000, and 4000 grams. Are 
any of these unusually heavy 
or light? 


Area = 0.4332 + 0.3944 


Area = 0.4332 Area = 0.3944 


=15 0 1.25 


From the Standard Normal Table, the area corresponding to z = —1.5 is 
0.4332 and the area corresponding to z = 1.25 is 0.3944. To find the area 
between these two z-scores, add the resulting areas. 


Area = 0.4332 + 0.3944 = 0.8276 


Interpretation So,82.76% of the area under the curve falls between z = —1.5 
and z = 1.25. 

> Try It Yourself 4 

Find the area under the standard normal curve between z = —2.165 and 
z= —1.35. 


a. Draw the standard normal curve and shade the area below the curve that is 
between z = —2.165 and z = —1.35. 

b. Use the Standard Normal Table (0-to-z) on page Al to find the areas that 
correspond to z = —2.165 and to z = —1.35. 

c. Subtract the smaller area from the larger area. Answer: Page A49 


Recall that in Section 2.5 you learned, using the Empirical Rule, that values 
lying more than two standard deviations from the mean are considered unusual. 
Values lying more than three standard deviations from the mean are considered 
very unusual. So, if a z-score is greater than 2 or less than —2, it is unusual. If a 
z-score is greater than 3 or less than —3, it is very unusual. 
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APPENDIX 


Table 1— Random Numbers 


92630 78240 19267 95457 53497 23894 37708 79862 76471 66418 
79445 78735 71549 44843 26104 67318 00701 34986 66751 99723 
59654 71966 27386 50004 05358 94031 29281 18544 52429 06080 
31524 49587 76612 39789 13537 48086 59483 60680 84675 53014 
06348 76938 90379 51392 55887 71015 09209 79157 24440 30244 


28703 51709 94456 48396 73780 06436 86641 69239 57662 80181 
68108 89266 94730 95761 75023 48464 65544 96583 18911 16391 
99938 90704 93621 66330 33393 95261 95349 51769 91616 33238 
91543, 73196 34449 63513 83834 99411 58826 40456 69268 48562 
42103 02781 73920 56297 72678 12249 25270 36678 21313 75767 


17138 27584 25296 28387 51350 61664 37893 05363 44143 42677 
28297 14280 54524 21618 95320 38174 60579 08089 94999 78460 
09331 56712 51333 06289 75345 08811 82711 57392 25252 30333 
31295 04204 93712 51287 05754 79396 87399 51773 33075 97061 
36146 15560 27592 42089 99281 59640 15221 96079 09961 05371 


29553-18432.) 13630) )3=— 05529) 02791 Ss 81017 = 49027) = 79031 =50912 )=—-09399 
23501 22642 63081 08191 89420 67800 55137 54707 32945 64522 
57888 85846 67967 07835 11314 01545 48535 17142 08552 67457 
55336 71264 88472 04334 63919 36394 11196 92470 70543 29776 
10087 10072 55980 64688 68239 20461 89381 93809 00796 95945 


34101 81277 66090 88872 37818 72142 67140 50785 21380 16703 
53362 44940 60430 22834 14130 96593 23298 56203 92671 15925 
82975 66158 84731 19436 55790 69229 28661 13675 99318 76873 
54827 84673 22898 08094 14326 87038 42892 21127 30712 48489 
25464 59098 27436 89421 80754 89924 19097 67737 80368 08795 


67609 60214 41475 84950 40133 02546 09570 45682 50165 15609 
44921 70924 61295 51137 47596 86735 35561 76649 18217 63446 
33170 30972 98130 95828 49786 13301 36081 80761 33985 68621 
84687 85445 06208 17654 51333 02878 35010 67578 61574 20749 
71886 56450 36567 09395 96951 35507 17555 35212 69106 01679 


00475 02224 74722 14721 40215 21351 08596 45625 83981 63748 
25993 38881 68361 59560 41274 69742 40703 37993 03435 18873 
92882 53178 99195 93803 56985 53089 15305 50522 55900 43026 
25138 26810 07093 15677 60688 04410 24505 37890 67186 62829 
84631 71882 12991 83028 82484 90339 91950 74579 03539 90122 


34003 92326 12793 61453 48121 74271 28363 66561 75220 35908 
53775 45749 05734 86169 42762 70175 97310 73894 88606 19994 
59316 97885 72807 54966 60859 11932 35265 71601 55577 67715 
20479 66557 50705 26999 09854 52591 14063 30214 19890 19292 
86180 84931 25455 26044 02227 52015 21820 50599 51671 65411 


21451 68001 72710 40261 61281 13172 63819 48970 51732 54113 
98062 68375 80089 24135 72355 95428 11808 29740 81644 86610 
01788 64429 14430 94575 75153 94576 61393 96192 03227 32258 
62465 04841 43272 68702 01274 05437 22953 18946 99053 41690 
94324 31089 84159 92933 99989 89500 91586 02802 69471 68274 


05797 =+43984 §=21575 =—-09908 = 70221 19791 51578 36432 33494 79888 
10395 14289 52185 09721 25789 38562 54794 04897 59012 89251 
35177 56986 §=—.25549 559730) »=64718 3=— 52630) = 31100 962384 =—49483 Ss «11409 
25633 89619 75882 98256 02126 72099 57183 55887 09320 73463 
16464 48280 94254 45777 45150 68865 11382 11782 22695 41988 


Reprinted from A Million Random Digits with 100,000 Normal Deviates by the Rand Corporation 
(New York: The Free Press, 1955). Copyright 1955 and 1983 by the Rand Corporation. Used by 
permission. 
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A8 APPENDIX B 


Table 2— Binomial Distribution 


TABLE 2—BINOMIAL DISTRIBUTION 
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This table shows the probability of x successes in n independent trials, each with probability of success p. 


p 


01 


05 


-10 


“15 


.20 


25 


30 


35 


40 


50 


05 


.60 


65 


-70 


75 


80 


85 


-90 


95 


.980 
.020 
.000 


.970 
029 
.000 
.000 


.961 
039 
.001 
.000 
.000 


951 
048 
.001 
.000 
.000 
.000 


941 
057 
.001 
.000 
.000 
.000 
.000 


932 
.066 
002 
.000 
.000 
.000 
.000 
.000 


923 
075 
.003 
.000 
.000 
.000 
.000 
.000 
.000 


914 
.083 
.003 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


N 
OMNAUNBRWN HO ONDUBWNH—|ONADAUBWNH ODO ANRWN HO UBWN—|O BWNHO WN—O NO! 


.902 
095 
.002 


857 
135 
.007 
.000 


815 
171 
014 
.000 
.000 


774 
.204 
021 
.001 
.000 
.000 


£35 
232 
031 
.002 
.000 
.000 
.000 


.698 
257 
041 
.004 
.000 
.000 
.000 
.000 


.663 
279 
051 
005 
.000 
.000 
.000 
.000 
.000 


.630 
299 
.063 
.008 
.001 
.000 
.000 
.000 
.000 
.000 


810 
.180 
.010 


729 
243 
027 
.001 


656 
292 
049 
.004 
.000 


590 
328 
073 
.008 
.000 
.000 


531 
354 
.098 
015 
.001 
.000 
.000 


478 
372 
124 
023 
.003 
.000 
.000 
.000 


430 
383 
149 
.033 
005 
.000 
.000 
.000 
.000 


387 
387 
172 
045 
.007 
.001 
.000 
.000 
.000 
.000 


723 
255 
023 


614 
325 
057 
.003 


522 
368 
.098 
011 
.001 


444 
392 
138 
024 
.002 
.000 


377 
399 
.176 
042 
.006 
.000 
.000 


321 
396 
.210 
.062 
011 
.001 
.000 
.000 


272 
385 
238 
084 
.018 
.003 
.000 
.000 
.000 


232 
368 
.260 
.107 
028 
005 
.001 
.000 
.000 
.000 


.640 
320 
.040 


512 
384 
.096 
.008 


410 
410 
154 
026 
.002 


328 
410 
205 
051 
.006 
.000 


262 
393 
246 
082 
015 
.002 
.000 


210 
367 
275 
115 
029 
.004 
.000 
.000 


168 
336 
294 
147 
046 
.009 
.001 
.000 
.000 


134 
302 
302 
176 
.066 
017 
.003 
.000 
.000 
.000 


563 
375 
.063 


422 
422 
141 
.016 


316 
422 
211 
047 
004 


.237 
396 
264 
.088 
015 
.001 


178 
356 
297 
132 
.033 
.004 
.000 


133 
311 
311 
173 
058 
012 
.001 
.000 


.100 
.267 
mo Bi 
.208 
.087 
023 
.004 
.000 
.000 


075 
225 
300 
234 
117 
039 
.009 
.001 
.000 
.000 


490 
420 
.090 


343 
441 
189 
027 


240 
412 
265 
.076 
.008 


168 
360 
309 
132 
.028 
.002 


118 
303 
324 
185 
.060 
.010 
.001 


082 
247 
318 
227 
.097 
025 
.004 
.000 


.058 
198 
.296 
254 
.136 
047 
.010 
.001 
.000 


.040 
.156 
.267 
.267 
172 
074 
021 
.004 
.000 
.000 


423 
455 
123 


275 
444 
239 
043 


179 
384 
311 
112 
015 


116 
312 
336 
181 
049 
005 


075 
244 
328 
.236 
095 
.020 
.002 


049 
185 
299 
.268 
144 
047 
.008 
.001 


032 
37 
259 
279 
188 
.081 
022 
.003 
.000 


021 
100 
216 
272 
219 
118 
042 
.010 
.001 
.000 


360 
480 
.160 


216 
432 
.288 
.064 


130 
346 
346 
154 
.026 


.078 
259 
346 
.230 
077 
.010 


047 
187 
311 
.276 
138 
.037 
.004 


.028 
131 
261 
.290 
194 
077 
017 
.002 


017 
.090 
.209 
279 
232 
124 
041 
.008 
.001 


.010 
.060 
161 
251 
251 
.167 
074 
021 
.004 
.000 


041 
.008 
.001 


.250 
500 
.250 


125 
3/2 
3/5 
125 


.062 
250 
375 
.250 
.062 


.031 
.156 
312 
312 
.156 
.031 


.016 
094 
234 
312 
234 
094 
.016 


.008 
055 
164 
273 
273 
164 
055 
.008 


.004 
.031 
109 
219 
273 
219 
109 
.031 
.004 


.002 
.018 
.070 
164 
246 
246 
164 
.070 
.018 
.002 


.203 
495 
.303 


091 
334 
408 
.166 


041 
.200 
368 
300 
092 


019 
113 
.276 
337 
.206 
.050 


.008 
.061 
.186 
303 
278 
.136 
.028 


.004 
032 
117 
239 
292 
214 
.087 
015 


.002 
.016 
.070 
172 
.263 
257 
157 
055 
.008 


.001 
.008 
041 
116 
213 
.260 
212 
111 
034 
005 


.160 
480 
360 


064 
.288 
432 
216 


026 
154 
346 
346 
130 


.010 
077 
.230 
346 
259 
.078 


.004 
.037 
138 
.276 
311 
187 
047 


.002 
017 
077 
194 
.290 
261 
131 
.028 


.001 
.008 
041 
124 
232 
279 
.209 
.090 
017 


.000 
.004 
021 
074 
.167 
251 
251 
161 
.060 
.010 


123 
455 
423 


043 
239 
444 
275 


015 
112 
311 
384 
179 


005 
049 
181 
336 
312 
116 


.002 
.020 
095 
.236 
328 
244 
075 


.001 
.008 
047 
144 
.268 
299 
185 
049 


.000 
.003 
022 
.081 
188 
279 
259 
137 
.032 


.000 
.001 
.010 
042 
118 
219 
272 
216 
100 
021 


.090 
420 
490 


027 
.189 
441 
343 


.008 
.076 
265 
A12 
.240 


.002 
.028 
132 
309 
360 
.168 


.001 
.010 
.060 
185 
324 
.303 
118 


.000 
.004 
025 
.097 
227 
318 
247 
082 


.000 
.001 
.010 
047 
.136 
254 
.296 
198 
.058 


.000 
.000 
.004 
021 
074 
172 
.267 
.267 
.156 
.040 


.063 
375 
563 


.016 
141 
422 
422 


.004 
047 
211 
422 
316 


.001 
015 
.088 
264 
396 
237 


.000 
004 
.033 
132 
297 
356 
178 


.000 
.001 
012 
.058 
173 
311 
311 
.133 


.000 
.000 
004 
023 
.087 
.208 
311 
.267 
100 


.000 
.000 
.001 
.009 
039 
117 
234 
300 
225 
075 


.040 
320 
.640 


.008 
.096 
384 
512 


.002 
.026 
154 
410 
410 


.000 
.006 
051 
205 
410 
328 


.000 
.002 
015 
082 
246 
393 
262 


.000 
.000 
.004 
029 
15 
i275 
367 
.210 


.000 
.000 
.001 
.009 
.046 
147 
294 
336 
168 


.000 
.000 
.000 
.003 
017 
.066 
176 
302 
302 
134 


.023 
255 
723 


.003 
057 
325 
614 


.001 
011 
.098 
368 
522 


.000 
.002 
024 
138 
392 
444 


.000 
.000 
.006 
042 
176 
399 
377 


.000 
.000 
.001 
011 
.062 
.210 
396 
321 


.000 
.000 
.000 
.003 
.018 
084 
.238 
385 
272 


.000 
.000 
.000 
.001 
005 
.028 
107 
.260 
368 
232 


.010 
.180 
810 


.001 
027 
243 
729 


.000 
.004 
049 
292 
656 


.000 
.000 
.008 
.073 
328 
590 


.000 
.000 
.001 
015 
.098 
354 
531 


.000 
.000 
.000 
.003 
023 
124 
372 
478 


.000 
.000 
.000 
.000 
005 
.033 
149 
383 
430 


.000 
.000 
.000 
.000 
.001 
.007 
045 
172 
387 
387 


.002 
095 
.902 


.000 
.007 
135 
857 


.000 
.000 
014 
171 
815 


.000 
.000 
.001 
021 
.204 
774 


.000 
.000 
.000 
.002 
.031 
232 
135 


.000 
.000 
.000 
.000 
.004 
041 
257 
.698 


.000 
.000 
.000 
.000 
.000 
.005 
051 
279 
.663 


.000 
.000 
.000 
.000 
.000 
.001 
.008 
.063 
299 
.630 
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Table 2—Binomial Distribution (continued) 


APPENDIX B 


TABLE 2—BINOMIAL DISTRIBUTION 
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01 


05 
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25 
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40 
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65 


.70 


75 


.80 


85 


-90 
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OMANADNBRWN HO HDOUOANADAUNBRWN HO CDCOANAUBRWN—O!|X 


10 


904 
091 
004 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


895 
099 
005 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


886 
107 
.006 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


.860 
130 
.009 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


599 
315 
075 
.010 
.001 
.000 
.000 
.000 
.000 
.000 
.000 


569 
329 
.087 
014 
.001 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


540 
341 
099 
017 
.002 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


463 
366 
135 
.031 
005 
.001 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


349 
387 
194 
.057 
011 
.001 
.000 
.000 
.000 
.000 
.000 


314 
384 
213 
.071 
.016 
.002 
.000 
.000 
.000 
.000 
.000 
.000 


282 
377 
.230 
085 
021 
.004 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


.206 
343 
.267 
129 
043 
.010 
.002 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


197 
347 
276 
130 
.040 
.008 
.001 
.000 
.000 
.000 
.000 


.167 
325 
287 
152 
054 
013 
.002 
.000 
.000 
.000 
.000 
.000 


142 
301 
292 
172 
.068 
019 
004 
.001 
.000 
.000 
.000 
.000 
.000 


.087 
231 
.286 
218 
116 
045 
013 
.003 
.001 
.000 
.000 
.000 
.000 
.000 
.000 
.000 


107 
.268 
302 
201 
.088 
026 
.006 
.001 
.000 
.000 
.000 


.086 
236 
295 
221 
111 
039 
.010 
.002 
.000 
.000 
.000 
.000 


.069 
.206 
283 
.236 
.133 
053 
.016 
.003 
.001 
.000 
.000 
.000 
.000 


035 
132 
231 
.250 
188 
103 
043 
014 
.003 
.001 
.000 
.000 
.000 
.000 
.000 
.000 


.056 
188 
282 
.250 
146 
058 
.016 
.003 
.000 
.000 
.000 


042 
155 
258 
258 
172 
.080 
027 
.006 
.001 
.000 
.000 
.000 


032 
127 
232 
258 
194 
103 
.040 
011 
.002 
.000 
.000 
.000 
.000 


013 
.067 
156 
225 
225 
165 
092 
039 
013 
.003 
.001 
.000 
.000 
.000 
.000 
.000 


.028 
121 
233 
.267 
.200 
.103 
.037 
.009 
.001 
.000 
.000 


.020 
.093 
.200 
257 
.220 
132 
057 
017 
.004 
.001 
.000 
.000 


014 
.071 
168 
.240 
231 
158 
079 
029 
.008 
.001 
.000 
.000 
.000 


.005 
.031 
092 
170 
219 
.206 
147 
.081 
035 
012 
.003 
.001 
.000 
.000 
.000 
.000 


014 
072 
176 
252 
.238 
154 
.069 
021 
.004 
.000 
.000 


.009 
052 
.140 
225 
243 
183 
099 
.038 
.010 
002 
.000 
.000 


.006 
.037 
109 
195 
237 
.204 
128 
059 
.020 
005 
.001 
.000 
.000 


.002 
013 
048 
111 
179 
212 
191 
32 
071 
.030 
.010 
.002 
.000 
.000 
.000 
.000 


.006 
.040 
121 
215 
251 
.201 
111 
042 
011 
.002 
.000 


.004 
027 
089 
MT 
.236 
221 
147 
.070 
.023 
005 
.001 
.000 


.002 
017 
.064 
142 
213 
227 
177 
101 
042 
012 
.002 
.000 
.000 


.000 
005 
022 
.063 
127 
.186 
.207 
77 
118 
.061 
024 
.007 
.002 
.000 
.000 
.000 


.001 
.010 
044 
ald 7. 
205 
246 
205 
Pa 7 
044 
.010 
.001 


.000 
005 
027 
.081 
161 
226 
.226 
161 
.081 
027 
005 
.000 


.000 
.003 
.016 
054 
121 
193 
.226 
193 
121 
054 
.016 
.003 
.000 


.000 
.000 
.003 
014 
042 
092 
153 
.196 
.196 
153 
092 
042 
014 
.003 
.000 
.000 


.000 
.004 
.023 
075 
.160 
234 
.238 
.166 
.076 
021 
.003 


.000 
.002 
.013 
.046 
113 
193 
.236 
.206 
126 
051 
.013 
.001 


.000 
.001 
.007 
.028 
.076 
149 
212 
223 
.170 
092 
034 
.008 
.001 


.000 
.000 
.001 
.005 
019 
051 
105 
165 
.201 
191 
.140 
.078 
032 
.009 
.002 
.000 


.000 
.002 
011 
042 
111 
.201 
251 
215 
121 
.040 
.006 


.000 
.001 
.005 
.023 
.070 
147 
221 
.236 
77 
089 
027 
.004 


.000 
.000 
.002 
012 
042 
101 
177 
227 
213 
142 
.064 
017 
.002 


.000 
.000 
.000 
.002 
.007 
024 
.061 
118 
177 
.207 
.186 
27 
.063 
022 
005 
.000 


.000 
.000 
.004 
021 
.069 
154 
.238 
252 
.176 
072 
014 


.000 
.000 
.002 
.010 
.038 
.099 
183 
243 
225 
.140 
052 
.009 


.000 
.000 
.001 
005 
.020 
059 
128 
.204 
237 
195 
109 
.037 
.006 


.000 
.000 
.000 
.000 
.002 
.010 
.030 
071 
132 
191 
212 
179 
111 
048 
013 
.002 


.000 
.000 
.001 
.009 
.037 
.103 
.200 
.267 
233 
121 
.028 


.000 
.000 
.001 
.004 
017 
057 
132 
.220 
257 
.200 
.093 
.020 


.000 
.000 
.000 
.001 
.008 
029 
079 
158 
231 
.240 
168 
071 
014 


.000 
.000 
.000 
.000 
.001 
.003 
012 
035 
.081 
147 
.206 
219 
.170 
092 
.031 
005 


.000 
.000 
.000 
.003 
.016 
.058 
146 
.250 
282 
188 
.056 


.000 
.000 
.000 
.001 
.006 
027 
.080 
172 
258 
258 
155 
042 


.000 
.000 
.000 
.000 
.002 
011 
.040 
.103 
194 
258 
232 
127 
032 


.000 
.000 
.000 
.000 
.000 
.001 
.003 
013 
039 
092 
165 
225 
225 
.156 
.067 
.013 


.000 
.000 
.000 
.001 
.006 
.026 
.088 
201 
302 
.268 
107 


.000 
.000 
.000 
.000 
.002 
.010 
039 
111 
221 
295 
.236 
.086 


.000 
.000 
.000 
.000 
.001 
.003 
.016 
053 
133 
.236 
.283 
.206 
.069 


.000 
.000 
.000 
.000 
.000 
.000 
.001 
.003 
014 
.043 
.103 
188 
250 
231 
132 
035 


.000 
.000 
.000 
.000 
.001 
.008 
.040 
130 
.276 
347 
197 


.000 
.000 
.000 
.000 
.000 
.002 
.013 
054 
152 
287 
325 
167 


.000 
.000 
.000 
.000 
.000 
.001 
.004 
019 
.068 
172 
292 
301 
142 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.001 
.003 
.013 
045 
116 
218 
.286 
231 
087 


.000 
.000 
.000 
.000 
.000 
.001 
011 
.057 
194 
387 
349 


.000 
.000 
.000 
.000 
.000 
.000 
.002 
.016 
.071 
213 
384 
314 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.004 
021 
085 
.230 
377 
282 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.002 
.010 
043 
129 
.267 
343 
.206 


.000 
.000 
.000 
.000 
.000 
.000 
.001 
.010 
075 
315 
599 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.001 
014 
.087 
329 
569 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.002 
017 
099 
341 
540 


.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.000 
.001 
005 
.031 
135 
366 
463 
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A10 APPENDIX B TABLE 2—BINOMIAL DISTRIBUTION 


Table 2— Binomial Distribution (continued) 


p 
n 01 05 .10 .15 .20 .25 .30 35 .40 .45 650 .55 .60 65 .70 .75 .80 .85 .90 .95 
16 851 440 .185 .074 .028 .010 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 


138.371) 329.210 .113 .053 .023 .009 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
010 146 275 277 211.134 073 .035 .015 .006 .002 .001 .000 .000 .000 .000 .000 .000 .000 .000 
000 .036 .142 .229 .246 .208 .146 .089 .047 .022 .009 .003 .001 .000 .000 .000 .000 .000 .000 .000 
000 .006 .051 .131 .200 .225 .204 .155 .101 .057 .028 .011 .004 .001 .000 .000 .000 .000 .000 .000 
000 .001 .014 .056 .120 .180 .210 .201 .162 .112 .067 .034 .014 .005 .001 .000 .000 .000 .000 .000 
.000 .000 .003 .018 .055 .110 .165 .198 .198 .168 .122 .075 .039 .017 .006 .001 .000 .000 .000 .000 
.000 .000 .000 .005 .020 .052 .101 .152 .189 .197 .175 .132 .084 .044 .019 .006 .001 .000 .000 .000 
.000 .000 .000 .001 .006 .020 .049 .092 .142 .181 .196 .181 .142 .092 .049 .020 .006 .001 .000 .000 
.000 .000 .000 .000 .001 .006 .019 .044 .084 .132 175 .197 .189 .152 .101 .052 .020 .005 .000 .000 
.000 .000 .000 .000 .000 .001 .006 .017 .039 .075 .122 .168 .198 .198 .165 .110 .055 .018 .003 .000 
.000 .000 .000 .000 .000 .000 .001 .005 .014 .034 .067 .112 .162 .201 .210 .180 .120 .056 .014 .001 
.000 .000 .000 .000 .000 .000 .000 .001 .004 .011 .028 .057 .101 .155 .204 .225 .200 .131 .051 .006 
.000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .009 022 .047 .089 .146 .208 .246 .229 .142 .036 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .002 .006 .015 .035 .073 .134 .211 .277 .275 .146 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .009 .023 .053 .113 .210 .329 .371 
.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .010 .028 .074 .185 .440 


20 0} 818 .358 .122 .039 .012 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 
1} 165 .377  .270 .137 058 .021 .007 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 

2| .016 .189 .285 .229 .137 .067 .028 .010 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 

3] .001 .060 .190 .243 .205 .134 .072 .032 .012 .004 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000 

4; 000 .013 .090 .182 .218 .190 .130 .074 .035 .014 .005 .001 .000 .000 .000 .000 .000 .000 .000 .000 

5| .000 .002 .032 .103 .175 .202 .179 127 .075 .036 .015 .005 .001 .000 .000 .000 .000 .000 .000 .000 

6 

7 

8 


ee ae 
DANOBRWNH-OUOANAUBKRWN—O!|X 


000 .000 .009 .045 .109 .169 .192 .171 .124 .075 .036 .015 .005 .001 .000 .000 .000 .000 .000 .000 
000 .000 .002 .016 .055 .112 .164 .184 .166 .122 .074 .037 .015 .005 .001 .000 .000 .000 .000 .000 
.000 .000 .000 .005 .022 .061 .114 .161 .180 .162 .120 .073 .035 .014 .004 .001 .000 .000 .000 .000 

9} .000 .000 .000 .001 .007 .027 .065 .116 .160 .177 .160 .119 .071 .034 .012 .003 .000 .000 .000 .000 
10]; .000 .000 .000 .000 .002 .010 .031 .069 .117 .159 .176 .159 .117 .069 .031 .010 .002 .000 .000 .000 
11] .000 .000 .000 .000 .000 .003 .012 .034 .071 .119 .160 .177 .160 .116 .065 .027 .007 .001 .000 .000 
12] .000 .000 .000 .000 .000 .001 .004 .014 .035 .073 .120 .162 .180 .161 .114 .061 .022 .005 .000 .000 
13] .000 .000 .000 .000 .000 .000 .001 .005 .015 .037 .074 .122 .166 .184 .164 .112 .055 .016 .002 .000 
14] .000 .000 .000 .000 .000 .000 .000 .001 .005 .015 .037 075 .124 .171 .192 .169 .109 .045 .009 .000 
15] .000 .000 .000 .000 .000 .000 .000 .000 .001 .005 .015 .036 .075 .127 .179 .202 .175 .103 .032 .002 
16]; .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .005 .014 .035 .074 .130 .190 .218 .182 .090 .013 
17] .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .004 .012 .032 .072 .134 .205 .243 .190 .060 
18] .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .010 .028 .067 .137 .229 .285 .189 
19] .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .002 .007 .021 .058 .137 .270 .377 
20] .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .012 .039 .122 .358 
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Table 3— Poisson Distribution 
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TABLE 3—POISSON DISTRIBUTION 


pe 
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
0 9048 8187 .7408 6703 .6065 5488 4966  .4493 4066 ~=—.3679 
1 .0905 1637 = 2222 ~——.2681 3033 205 3476 = 3595 ODIO 
2 .0045 0164 = .0333 0536 = .0758 0988 1217 1438 .1647 = .1839 
3 0002 ~=.0011 .0033 0072 =.0126 =6©.0198 = .0284_— «0383 0494 .0613 
4 0000 ~=—-.0001 .0003 0007. + .0016 =©.0030 »=©.0050 §=.0077 ~— 0111 0153 
5 0000 .0000 .0000 + .0001 0002. = .0004 §=0007 0012 0020 38.0031 
6 0000 = .0000 .0000 0000 0000 0000 =~ .0001 0002 = .0003 .0005 
7 0000 .0000 .0000 0000 0000 =.0000 0000 0000 0000 .0001 
pe 
x 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 
0 3329 = 3012) .2725 2466 = .2231 2019 1827. .1653 1496 —.1353 
1 3662 .3614 .3543 3452 3347) .3230)—S 3106 ~=——.2975 2842 = .2707 
Z 2014 =.2169 ~—.2303 2417 = .2510 = .2584 =.2640)=—.2678 ~=— 2700 ~——.2707 
3 0738 .0867 0998 1128 1255 1378 .1496 .1607. =.1710 ~—.1804 
4 .0203 0260 .0324 .0395 0471 0551 0636 =.0723 0812  .0902 
5 0045 0062 .0084 0111 0141 0176 .0216 .0260 .0309 .0361 
6 0008 0012 0018 .0026 .0035 0047 = .0061 0078 .0098  .0120 
7 .0001 0002 ~=.0003 .0005 .0008 = .0011 0015 0020 .0027 83=.0034 
8 0000 .0000 ~~ .0001 .0001 .0001 0002 ~=.0003 .0005 0006 =.0009 
9 0000 .0000 .0000 .0000 0000 0000  .0001 0001 .0001 .0002 
i 
x 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 
0 1225 1108 = .1003 0907 ~—.0821 0743 0672 .0608 .0550 .0498 
1 22/2 2 3 O23 02117205 lS 1815 .1703 1596 = .1494 
2 .2700 ~—-.2681 2652 ~—.2613 .2565 2510 .2450 .2384 .2314 2240 
3} 1890 .1966  .2033 209023 Smee Omer2 205 2225 .2237  ~—-.2240 
4 0992, 1082) 1169 .1254-—Ss «13361414 1488 1557) 1622 ~—-«.1680 
3 0417 0476 0538 0602 .0668 0735 0804 .0872 0940 .1008 
6 0146 0174 0206 0241 0278 .0319 .0362 0407 # .0455 .0504 
7 0044 = .0055 .0068  .0083 0099. 0118 =.0139 0163 0188  .0216 
8 0011 0015 0019 .0025 .0031 0038 .0047 0057 0068 .0081 
9) .0003 0004 =.0005 0007. + .0009_ ~=—-«.0011 0014 0018 0022 .0027 
10 .0001 .0001 .0001 0002. =.0002 ~=.0003 0004 ~=.0005 .0006 = .0008 
11 0000 .0000 .0000 .0000 0000 ~=3 .0001 .0001 .0001 0002. = .0002 
12 0000 .0000 .0000 0000 0000 0000 =@=8=.0000 0000 0000  .0001 
iD 
x 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 
0 0450 .0408 .0369 .0334 0302 .0273 0247 = .0224 =—.0202_~—— 0183 
1 se S04 IY oS 1057. .0984— 0915 0850 .0789 = .0733 
2 .2165 2087. .2008 =.1929—S «1850 -~——.1771 1692 1615 15391465 
S) 7223) 2 22 Oe 2209 OOM 211 On 225) 2087 =.2046 ~—.2001 .1954 
4 1734 1781 1823 1858 .1888 .1912  .1931 1944 1951 1954 
5 1075 1140 = .1203 2648 22S 7 429 Ali 2256S 
6 0555 0608 .0662 0716 0771 0826 ~=.0881 0936 .0989 ~~ .1042 
7 0246 8.0278 8.0312 .0348 .0385 0425 0466 .0508 .0551 0595 
8 .0095 0111 0129 0148 8.0169 = 3.0191 0215 0241 0269 .0298 
9) .0033 0040 .0047 0056 .0066 .0076 0089 0102 0116 8.0132 
10 0010 .0013 0016 .0019 .0023 0028  =.0033 0039 =.0045 .0053 
11 .0003 .0004 .0005 0006 .0007 .0009 ~~ .0011 .0013 0016 .0019 
12 .0001 .0001 .0001 0002. =.0002 ~=.0003 .0003 0004 ~=.0005 .0006 
13 0000 .0000 .0000 .0000 #.0001 .0001 .0001 .0001 0002 .0002 
14 0000 .0000 .0000 0000 0000 0000 =8=.0000 0000 0000  .0001 


Reprinted with permission from W. H. Beyer, Handbook of Tables for Probability and Statistics, 2e, CRC Press, 
Boca Raton, Florida, 1986. 
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pe 
x 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 
0 0166 .0150 0136 8.0123 0111 0101 .0091 0082 .0074 = .0067 
1 0679 .0630 .0583 .0540 0500 .0462 0427 .0395 .0365 .0337 
2 1393 1323, .1254 ~—.1188 1125 1063 1005 0948 .0894 .0842 
3 1904 1852 1798  .1743 1687 = .1631 1574  .1517. 1460 ~——«.1404 
4 1951 1944 —_.1933 1917 1898 = .1875 1849 =.1820. 1789'S 1755 
5 1600 .1633 .1662 .1687 MANS MM V25 clk IZA 7/5533 ll 55) 
6 1093 1143 1191 1237 1281 .1323 1362) = .1398 ~=—— 14321462 
I/ 0640 .0686 0732 0778 0824 .0869 .0914 0959 1002 # .1044 
8 0328 .0360 .0393 0428 .0463 0500 3.0537 = .0575 0614 .0653 
9 0150 .0168 0188 .0209 10257 ReE 0255 0280 .0307 .0334 .0363 
10 .0061 .0071 .0081 .0092 0104 =©.0118 =6.0132,——s 0147) 0164 ~—S «0181 
11 .0023 0027 =.0032 ~=—.0037 .0043 0049 0056 .0064 .0073 .0082 
12 0008 .0009  .0011 0014 0016 .0019 0022 0026 .0030 .0034 
iE) 0002 .0003 .0004 .0005 0006 .0007 .0008 .0009 .0011 .0013 
14 .0001 .0001 .0001 .0001 0002 =.0002 8.0003 .0003 0004 = .0005 
15 0000 .0000 .0000 .0000 .0001 .0001 .0001 .0001 .0001 .0002 
pe 
x 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 
0 .0061 0055 .0050 8.0045 .0041 0037 = .0033 0030 .0027 8.0025 
1 0311 0287 = .0265 0244 0225 0207. —.0191 0176 .0162 .0149 
2 .0793 0746 ~=—.0701 0659 0618 .0580 .0544 0509 0477 .0446 
3 1348 = 1293) 1239-1185 ell 1333 1082 = .1033 .0985 0938  .0892 
4 1719 .1681 1641 .1600 1558 = .1515 1472, 1428 ~—.1383 1339 
5 wli753) 1748 =.1740 1728 ANA GE IGS. IOS) — IOS IIIS) 
6 1490 .1515 = 1537) 1555 1571 1584 .1594 ~—.1601 1605 .1606 
7 nlOSG M2568 .1200 223 4267 2S 52 Oe SDS ll S/7/ 
8 0692 = .0731 0771 .0810 0849 .0887 .0925 0962 =.0998 ~—-.1033 
G) 0392 =.0423 =.0454 = .0486 0519 .0552 .0586 .0620 .0654 .0688 
10 0200 .0220  .0241 0262 0285 0309 = .0334.—s «0359 = .0386~— 0413 
11 .0093 0104 0116 0129 0143 LOS Ol 0190 .0207 8 .0225 
12 0039 = .0045 = .0051 .0058 .0065 .0073 0082 .0092 0102 0113 
18} .0015 0018 = .0021 0024 0028 .0032 .0036 #8 .0041 0046 .0052 
14 0006 =.0007 0008 .0009 0011 .0013 0015 0017. -.0019~=— .0022 
15 0002 = .0002 .0003 .0003 0004 = .0005 0006 .0007  .0008 .0009 
16 .0001 .0001 .0001 .0001 .0001 0002. »=.0002 =.0002 8.0003 .0003 
17 0000 .0000 .0000 .0000 0000 .0000  .0001 .0001 .0001 .0001 
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TABLE 3—POISSON DISTRIBUTION 


pe 
x 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 
0 0022. 0020 0018 .0017 0015 0014 = .0012 ~=.0011 0010 .0009 
1 10/13/70 Ome ON Cm OOG) 0098 .0090 .0082 .0076 .0070 # .0064 
2 0417 .0390 .0364 .0340 0318 .0296 .0276 0258 0240 0223 
3 0848 .0806 .0765 .0726 0688 0652 .0617 0584 0552 .0521 
4 1294 = 1249 ~—— 1205 .1162 1118  .1076 =.1034 =.0992 0952 .0912 
5) 1579 = 15491519 .1487 1454 = .1420. 1385) 1349) .1314 1277 
6 1605 ~=—.1601 1595 1586 1575 1562) 1546 1529, .1511 .1490 
I 1399 1418 ~—.1435 .1450 1462  .1472  .1480 .1486 .1489 .1490 
8 1066 .1099 .1130  .1160 1188 = 1215) = 1240-1263 1284 ~— 1304 
©) (0723 0D OLS 0825 0858 = -.0891 0923 .0954 .0985 «1014 
10 0441 0469 .0498 .0528 0558 .0588 .0618 .0649 0679 .0710 
11 0245 .0265 .0285 .0307 0330 .0353 .0377 ~~ .0401 0426 0452 
12 0124 0137. .0150 =.0164 0179 =.0194 =.0210 = .0227 0245 0264 
is 0058 .0065 .0073 .0081 0089 0098 .0108 0119 0130 #8 .0142 
14 0025 0029 = .0033 .0037 0041 0046 = .0052 0058 .0064 ~ .0071 
15 0010 .0012 .0014 .0016 0018 .0020 .0023 0026 .0029 #8 .0033 
16 0004 =.0005 = .0005 .0006 0007, + .0008 ~=.0010_ ~—.0011 .0013 0014 
17 .0001 0002. .0002  .0002 0003 .0003 .0004 .0004 0005 .0006 
18 .0000 = .0001 .0001 .0001 .0001 .0001 .0001 0002. .0002 .0002 
ig 0000  .0000 .0000 .0000 0000  .0000 .0000  .0001 0001 0001 
a 
x 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 
0 0008 .0007 .0007 .0006 0006 ~=.0005 0005 .0004 .0004 .0003 
1 0059 = .0054 ~=.0049 0045 0041 .0038 0035 .0032 .0029 # .0027 
2 0208 .0194 .0180 .0167 0156 = .0145 0134 = =©.0125. =-.0116 ~=—.0107 
3 0492 0464 .0438 0413 0389 = .0366 0345 .0324 .0305 .0286 
4 0874 0836 .0799 .0764 0729 =.0696 0663 .0632 .0602 .0573 
5 1241 1204 1167 .1130 1094 = .1057 1021 0986 = .0951 .0916 
6 1468 1445 ~=.1420 1394 .1367  —.1339 1311 1282. 1252) .1221 
7 1489 .1486 ~=—.1481 1474 1465 = .1454 1442 1428 = 1413. 1396 
8 1321 1337, .1351 1363 1373, .1382 1388 = 1392-1395 1396 
9 1042. .1070 + .1096 all 21) 1144 1167 1187) 1207/1224 .1241 
10 0740 0770  .0800 0829 0858  .0887 0914 = .0941 0967 = .0993 
11 00478 0504 .0531 0558 0585 = .0613 0640 0667 .0695 .0722 
12 0283 =.0303— .0323 0344 0366 =.0388 0411 0434 .0457 ~=—.0481 
ils) 0154 0168 .0181 .0196 0211 0227 0243, .0260 0278 .0296 
14 0078 .0086 .0095 .0104 0113, .0123 0134 =©.0145 = 0157'S 0169 
iS 0037 = .0041 .0046 0051 0057 = .0062 0069 0075 .0083 .0090 
16 0016 0019 .0021 0024 0026 .0030 0033 =.0037 ~— .0041 0045 
17 0007 1.0008 .0009 .0010 0012 ~ .0013 0015  .0017  .0019 .0021 
18 0003 .0003 .0004 .0004 0005 .0006 0006 .0007 .0008 .0009 
19 .0001 .0001 .0001 .0002 .0002 .0002 0003 .0003 .0003 .0004 
20 0000 .0000  .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0002 
21 0000  .0000  .0000 .0000 0000 —_.0000 0000  .0000  .0001 0001 
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pe 
x 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.0 
0 0003 .0003 .0002 .0002 0002 = .0002 0002. = =.0002 =~ .0001 .0001 
1 0025 .0023 .0021 .0019 0017 = .0016 0014 = .0013 0012. =—.0011 
2 0100 0092 .0086 0079 0074 .0068 .0063 0058 .0054 .0050 
3 026902570257, 0222 0208 .0195 .0183 0171 0160 =.0150 
4 0544 0517 0491 .0466 0443 = .0420 0398 .0377 8.0357 = .0337 
5 0882 .0849 .0816 .0784 0752 0722. 0692  .0663 .0635 .0607 
6 1191 1160  =.1128 .1097 1066 ~=.1034 1003 0972 = .0941 0911 
7 oll AS oll SSKS) SSK} oll 3 7/ 1294 1271 OA lA IS Zl 
8 1395 = .1392 1388 1382 1375 ~—-.1366 1356 =.1344 —.1332.-—— 1318 
8) nI25 6269 e230) .1290 1299 .1306 1311 SUS) ASI oll Stites 
10 1017. 1040 ~—-.1063 1084 1104 = .1123 1140 .1157. 11721186 
11 0749 0776 = .0802 .0828 0853  —.0878 0902 = .0925 0948  ~=—.0970 
12 0505 0530 .0555 0579 0604 .0629 0654 .0679 ~~ .0703 .0728 
13) 0315 .0334 .0354 0374 0395 0416 0438 .0459 .0481 .0504 
14 0182 0196 .0210 0225 0240 .0256 0272 = =.0289 = .0306 8.0324 
15 0098 .0107 .0116 .0126 0136 = .0147 0158 .0169 0182 .0194 
16 0050 .0055 .0060 .0066 0072  .0079 0086 .0093 .0101 .0109 
17 0024 .0026 .0029 .0033 .0036 .0040 0044 .0048 .0053 .0058 
18 .0011 0012 .0014 0015 0017. = .0019 .0021 0024 .0026 8 .0029 
19 0005  .0005 .0006 .0007 0008  .0009 0010 ~=.0011 0012 =.0014 
20 0002. 0002 .0002 .0003 .0003 .0004 0004 = .0005 .0005 .0006 
21 .0001 .0001 .0001 .0001 .0001 .0002 0002. = .0002 0002 # .0003 
22 0000 .0000  .0000 .0000 .0001 .0001 .0001 .0001 .0001 .0001 
pe 
x 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10.0 
0 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0000 
1 0010 .0009 .0009 .0008 0007. +.0007- +0006 = .0005 .0005 .0005 
2 0046 .0043 0040 .0037 0034 ~=.0031 0029. = .0027 ~=—.0025 .0023 
3 0140 38.0131 0123 0115 0107. = .0100 ~=.0093 0087 ~—-.0081 .0076 
4 0319 .0302 .0285 .0269 0254 0240 0226 0213 .0201 .0189 
5 0581 0555 0530  .0506 0483 00460 .0439 0418 .0398 .0378 
6 .0881 0851 0822 = .0793 0764 = .0736 §©.0709 0682 .0656 .0631 
7 1145 suit) ei} 1064 B03 //aeeee OOM OSS 2B O955 0928 ~—.0901 
8 1302. 1286-1269 -~—— 1251 1232. 1212) ——.1191 1170 =.1148 1126 
9 SIZ ili 1311 .1306 1300 =.1293- 612841274 ~—.1263 cll 25) 
10 1198 = .1210 = =.1219 = 1228 1235 1241 1245 1249 = .1250 ~—.1251 
11 0991 1012 .1031 .1049 ley = (MOS) MENS NS) oll 1337/ 
12 0752. 0776 =.0799 = .0822 0844 0866 .0888 .0908 0928 .0948 
13 0526 .0549 0572 .0594 0617  .0640 .0662 .0685 0707 = .0729 
14 0342 = .0361 .0380 §=.0399 0419 0439 0459 0479 0500 3 .0521 
15 0208 =.0221 10235 .0250 0265 = .0281 0297 —.0313 0330 = .0347 
16 0118 .0127 0137 ~=.0147 0157. =.0168 »=°.0180 = 0192S 0204 = «0217 
il7/ .0063 0069 ~=.0075 .0081 0088 .0095 .0103 0111 ONO O128 
18 0032 .0035 0039 =.0042 0046 =.0051 0055 0060 .0065 .0071 
19 .0015 0017. +.0019~—-.0021 0023 .0026 §=.0028 ~.0031 0034 = .0037 
20 0007. +.0008 .0009 #.0010 0011 0012 .0014 = .0015 0017. = .0019 
21 .0003 .0003 0004 =.0004 0005 .0006 .0006 0007 .0008 .0009 
22 .0001 .0001 0002 ~=.0002 0002. + =.0002 .0003 .0003 0004 =.0004 
23 0000 ~=—-.0001 .0001 .0001 .0001 0001 .0001 .0001 0002. = .0002 
24 0000 .0000 .0000 .0000 0000 .0000 .0000 .0001 .0001 .0001 
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TABLE 3—POISSON DISTRIBUTION 


x 11 12 13 14 15 16 17 18 19 20 

0 0000 .0000 .0000 .0000 .0000 .0000 0000 .0000 0000 .0000 
1 0002 .0001 .0000 .0000 0000 .0000 0000 .0000 .0000 .0000 
2 0010 .0004 0002 .0001 .0000 .0000 0000 .0000 0000 .0000 
3 0037 .0018 .0008 .0004 0002 .0001 0000 .0000 .0000 .0000 
4 0102 0053 .0027 0013 .0006 .0003 .0001 .0001 0000 .0000 
5 0224 0127 .0070 .0037 0019 .0010 0005 0002 .0001 .0001 
6 0411 0255 0152 0087 .0048 0026 0014 .0007 0004 0002 
7 0646 0437 .0281 .0174 0104 .0060 .0034 0018 .0010 0005 
8 0888 .0655 0457 0304 .0194 0120 .0072 .0042 0024 0013 
9 1085 0874 .0661 0473 .0324 0213 0135 .0083  .0050 .0029 
10 1194 1048 .0859 .0663 .0486 .0341 .0230 .0150 .0095 .0058 
11 1194 1144 1015 .0844 0663 .0496 0355 .0245 .0164 .0106 
12 1094 1144 1099 0984 .0829 0661 .0504 0368  .0259 .0176 
13 0926 .1056 .1099 1060 .0956 .0814 0658 .0509 .0378 .0271 
14 0728 0905 1021 .1060 1024 0930 .0800 .0655 .0514 .0387 
15 0534 0724 .0885 .0989 1024 .0992 0906 .0786 .0650 .0516 
16 0367. .0543. 0719 0866 .0960 .0992 .0963 .0884 0772 .0646 
‘7 0237. .0383. +=.0550 +«=«.0713.—S «0847'S s«60934. Ss «0963S s«.0936.-—S («0863S «0760 
18 0145. 0256 0397 .0554 .0706 .0830 .0909 .0936 .0911 .0844 
19 0084 0161 .0272 .0409 .0557 .0699 0814 .0887 .0911 .0888 
20 0046 .0097 0177 0286 .0418 0559 .0692 .0798  .0866 .0888 
x 11 12 13 14 15 16 17 18 19 20 

21 0024 0055 0109 .0191 .0299 0426 0560 .0684 .0783 .0846 
22 0012 .0030 .0065 0121 .0204 .0310 0433 .0560 .0676 .0769 
23 0006 .0016 0037 0074 0133 0216 .0320 .0438 .0559 .0669 
24 0003 .0008 .0020 .0043 .0083 .0144 0226 .0328 0442 .0557 
25 0001 .0004 0010 .0024 .0050 .0092 0154 .0237 .0336 .0446 
26 0000 .0002 0005 0013 .0029 0057 0101 .0164 0246 .0343 
27 0000 .0001 0002 0007 .0016 .0034 0063 .0109 .0173 .0254 
28 0000 0000 .0001 .0003 .0009 .0019 0038 0070 0117 .0181 
29 0000 .0000 .0001 0002 .0004 0011 0023 .0044 0077. 0125 
30 0000 0000 .0000 .0001 .0002 .0006 0013 .0026 .0049 0083 
31 0000 .0000 .0000 .0000 .0001 .0003 .0007 .0015 .0030 .0054 
32 0000 .0000 .0000 .0000 .0001 .0001 0004 .0009 0018  .0034 
33 0000 .0000 .0000 .0000 .0000 .0001 0002 .0005 0010 .0020 
34 0000 0000 .0000 0000 .0000 .0000 0001 .0002 .0006 0012 
35 0000 .0000 .0000 .0000 .0000 .0000 0000 .0001 0003 .0007 
36 0000 .0000 .0000 .0000 .0000 .0000 0000 .0001 0002 .0004 
37 0000 .0000 0000 .0000 .0000 .0000 .0000 .0000 .0001 0002 
38 0000 .0000 .0000 .0000 .0000 .0000 0000 .0000 0000 0001 
39 0000 .0000 .0000 .0000 .0000 .0000 0000 .0000 .0000 0001 
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Table 4— Standard Normal Distribution 


4 .09 .08 .07 -06 05 .04 -03 .02 01 -00 


—3.4 .0002 .0003 0003 .0003 .0003 .0003 = .0003 .0003 .0003 .0003 
=3.3 0003 .0004 .0004 .0004 .0004 0004 .0004 0005 .0005 .0005 
—3.2 0005 ~=.0005 .0005 .0006 .0006 .0006 .0006 .0006 .0007 #.0007 
=3.1 0007 ~—.0007 .0008 = .0008 .0008 .0008  .0009 0009 .0009 = .0010 
— 3.0 .0010 .0010 0011 0011 0011 0012 .0012 0013 =.0013~—- .0013 
=2.9 0014 0014 0015 .0015 .0016 0016 .0017 0018 .0018 .0019 
—2.8 0019 = .0020 0021 .0021 0022 0023 = .0023 0024 .0025 .0026 
=247/ 0026 .0027 0028  .0029 .0030 0031  .0032 0033 .0034 ~=.0035 
—2.6 .0036 .0037 0038 .0039 .0040 .0041 .0043 0044 .0045 .0047 
=2.5 0048 .0049 .0051 0052 .0054 0055 = .0057 0059 .0060 .0062 
—2.4 0064 .0066 .0068 .0069 .0071 0073 =.0075 0078 .0080 .0082 
=2.3 0084 = .0087 0089 ~—.00911 .0094 0096 .0099 0102 .0104 = .0107 
—2.2 0110 =.0113 0116 =.0119 0122 0125 =.0129 0132 =.0136 ~—.0139 
—2.1 0143 0146 0150 0154 .0158 0162 .0166 0170 0174 0179 
—2.0 0183 = .0188 0192 = .0197 .0202 0207 = .0212 0217 = .0222 = .0228 
=1d°) 0233. .0239 0244 = .0250 0256 0262 .0268 0274 ~—-.0281 0287 
—1.8 0294 ~—.0301 0307 = .0314 0322 0329 .0336 0344 ~=.0351 0359 
=1.7 0367 = .0375 0384 = .0392 .0401 0409 =-.0418 0427 =.0436 ~=—- .0446 
—1.6 0455 .0465 0475 = .0485 0495 0505 .0516 0526 .0537 =.0548 
=1.5 0550 ODA 0582 = .0594 .0606 0618  .0630 0643 .0655 .0668 
—1.4 .0681 0694 0708 = .0721 0735 0749 =.0764 0778  .0793 .0808 
=1.3 0823 = .0838 0853  — .0869 .0885 0901 =.0918 0934 = .0951 .0968 
-—1.2 0985 ~—-.1003 1020 = .1038 .1056 1075 = .1093 1112 1131 1151 


Salle AZO Io ZNO — 1l28K0) oll 25)1| M27 222 SIG: BSS GIS157/ 
—1.0 .1379 1401 1423 .1446 .1469 1492 1515 1539 1562. .1587 
-—0.9 1611 1635 1660 .1685 oll 7/1 oll St) II TAS 2 1788 = =.1814 ~—.1841 
—0.8 1867 = .1894 1922 = .1949 1977 2005 ~=.2033 .2061 2090 = .2119 


—0.7 2148 = 2177 2206 .2236 .2266 PGES SVT) 2358 .2389 =.2420 
— 0.6 2451 2483 2514 .2546 .2578 2611 = .2643 2676 =.2709 ~—.2743 
-0.5 2776 ~=.2810 = .2843)— 2877 Oi 2946 2981 HONS — -SKOK0) SHOES) 
—0.4 3121 3156 =.3192 3228 3264 3300 = .3336 3372 = =©.3409—s 3446 
—0.3 3483 = .35202)— 3557) SS .3594 3632 3669 = .3707 3745 = 3783-3821 
—0.2 3859 = 3897 3936 = .3974 4013 4052 .4090 4129 4168 4207 
—0.1 4247 ~=4286~=— 4325 — 4364 4404 4443 4483 4522 4562 .4602 
—0.0 4641 4681 4721 4761 4801 4840  .4880 4920 4960 — .5000 


Critical Values 
Level of Confidence c Ze 
0.80 1.28 
0.90 1.645 
0.95 1.96 
0.99 DEMS) 


Table A-3, pp. 681-682 from Probability and Statistics for Engineers and Scientists, 6e by Walpole, Meyers, and 
Myers. Copyright 1997. Reprinted by permission of Pearson Prentice Hall, Upper Saddle River, N.J. 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


APPENDIX B 


TABLE 4—STANDARD NORMAL DISTRIBUTION 


Table 4— Standard Normal Distribution (continued) 


nN 


4 .00 01 -02 03 .04 .05 .06 .07 .08 .09 
0.0 5000 ~=—.5040 5080 =—.5120 .5160 5199 5239 =.5279 5319 ~~ .5359 
0.1 5398 5438 9478 5517 / Hee) Base we7/5 ele!  syv/33 
0.2 5793-5832 5871 5910 5948 5987 6026 .6064 ~ .6103 6141 
0.3 6179 6217 (62550295 .6331 6368  .6406 .6443 6480 = .6517 
0.4 6554 .6591 6628  .6664 .6700 6736  .6772 6808  .6844  .6879 
0.5 CES eID) ESI AMIE) 7054 7088 ~~ .7123 ey! 7190 = 7224 
0.6 7257 7291 7324 = 7357 .7389 7422 7454  .7486 ~— 7517 7549 
0.7 7580 7611 7642 ~— .7673 .7704 7734 7764 ~~ 77947823, 7852 
0.8 7881 7910 7939 = .7967 7995 8023 8051 8078 8106  .8133 
0.9 8159 8186 8212 8238 8264 8289 = .8315 8340 8365 8389 
1.0 8413 8438 8461 8485 8508 8531 8554 = 8577 8599 = .8621 
1.1 8643  .8665 8686 .8708 8729 8749 8770 .8790  .8810  .8830 
1.2 8849 8869 8888  .8907 8925 8944 8962 8980 .8997 9015 
1.3 9032 .9049 .9066 .9082 .9099 SNS 9131 9147 .9162 NNT 
1.4 9192  .9207 9222  .9236 9251 9265 9279 = .9292 9306 §=.9319 
1.5 9332 .9345 L357 E70) .9382 9394 9406 9418 .9429 .9441 
1.6 9452 .9463 9474 9484 9495 9505 9515 9525 9535 9545 
1.7 9554 9564 9573 .9582 pop Ul Eee) Salts Lai ex5 .9633 
1.8 .9641 .9649 9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 
1.9 SAS MN B77 SWS) .9738 9744 9750 .9756 .9761 .9767 
2.0 9772 9778 9783 = .9788 9793 9798 9803 9808 .9812 .9817 
2.1 9821 9826 .9830 9834 .9838 .9842 9846 9850 .9854 .9857 
2.2 .9861 9864 .9868  .9871 .9875 .9878  .9881 .9884 .9887 .9890 
2.3 .9893 .9896 .9898 .9901 .9904 LE) OO LEHI GNIS .9916 
2.4 9918  .9920 9922 = .9925 9927 9929 = 9931 9932 9934 9936 
2.5 9938 9940 .9941 9943 9945 9946 9948 .9949 .9951 p92) 
2.6 9953 .9955 9956 = .9957 9959 9960 .9961 9962 .9963 .9964 
2.7 Bo}-] o}o 2-1 0] oP) -1 0) 1101 2) .9969 9970 = .9971 9972 SEM} 9974 
2.8 9974  .9975 9976 9977 .9977 9978 .9979 .9979 .9980  .9981 
2.9 9981 9982 9982 .9983 9984 9984 = .9985 9985 9986 .9986 
3.0 9987 .9987 .9987 .9988 .9988 9989 .9989 .9989 .9990 .9990 
3.1 LEY) Osh 9991 9991 19992) 12992) 9992 9992 9993 9993 
3.2 9993 = .9993 9994 9994 .9994 9994 9994 .9995 9995 9995 
3.3 L225 _Leye)5 L5G .9996 Ce) Oe el eee eee 
3.4 9997 —.9997 9997 — .9997 .9997 .9997 .9997 .9997 .9997 .9998 
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A18 APPENDIX B TABLE 5—t-DISTRIBUTION 


Table 5 — t-Distribution 


=f t =f t -t t 


c-confidence interval Left-tailed test Right-tailed test Two-tailed test 
Level of 
confidence, c 0.50 0.80 0.90 0.95 0.98 0.99 
One tail, a 0.25 0.10 0.05 0.025 0.01 0.005 
d.f. Two tails, a 0.50 0.20 0.10 0.05 0.02 0.01 


1.000 3.078 6.314 12.706 31.821 63.657 
816 1.886 2.920 4.303 6965 9.925 
765 = 1.638 2.353 3.182 94.541 5.841 
741 15333) 2.132 2.776 3.747 4.604 
727 ~~ 1.476 2.015 2.571 3.365 4.032 
718 1.440 1.943 2447 3.143 3.707 
711 1.415 1.895 2.365 2.998 3.499 
OCMEIRSo7 1.860 2.306 2.896 3.355 
703 ~—-1.383 1.833 2.262 2.821 3.250 
AW) 1372 1.812 2.228 2.764 3.169 
697 =: 1.363 1.796 2.201 2.718 3.106 
(GOS MEIES 50 78252 OZ 08 3.055 
694 = 1.350 1.771 2.160 2.650 3.012 
692 1.345 1.761 2.145 2.624 2.977 
691 1.341 1.753, 2.131 2.602 2.947 
690 = 1.337 1.746 2.120 2.583 2.921 
689 = 1.333 1.740 2.110 2.567 2.898 
688 — 1.330 1.734 2.101 2.552 2.878 
688 1.328 1.729 2.093 2.539 2.861 


SOYAnNKRWNHNAOYMANAUAWN = 


20 687 = 1.325 1.725 2.086 2.528 2.845 
21 686 = 1.323 1.721 2.080 2518 2.831 
22 686 =1.321 1.717 2.074 2.508 2.819 
23 685 1.319 1.714 2.069 2.500 2.807 
24 685 1.318 1.711 2.064 2492 2.797 
25 684 =1.316 1.708 2.060 2.485 2.787 
26 684 1.315 1.706 2.056 2.479 2.779 
27 684 =1.314 1.703 2.052 2.473 2.771 
28 3333) laadils} 1.701 2.048 2467 2.763 
29 683 = 1.311 1.699 2.045 2.462 2.756 
oo 674 1.282 1.645 I ABV ABS 


Adapted from W. H. Beyer, Handbook of Tables of Probability and Statistics, 2e, 
CRC Press, Boca Raton, Florida, 1986. Reprinted with permission. 
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APPENDIX B TABLE 6—CHI-SQUARE DISTRIBUTION A19 


Table 6— Chi-Square Distribution 


o 
5 vr - v 
x xe Xi 
Right tail Two tails 
Degrees of a 

freedom 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 
1 — — 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4605 5.991 7.378 9.210 10.597 
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 1.064 FAT 9.488 11.143 13.277 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086 16.750 
6 0.676 0.872 2377, 1.635 2.204 10.645 12.592 14449 16.812 18.548 
Z 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18475 20.278 
8 1.344 1646 2.180 2.733 3490 13.362 15.507 17.535 20.090 21.955 
9 15735 2.088 2.700 3.325 4.168 14684 16.919 19.023 21.666 23.589 
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 
2 3.074 3.571 4404 5.226 6.304 18.549 21.026 23.337 26.217 28.299 
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 
19 6.844 7.633 8.907. 10.117. 11.651 27.204 30.144 32.852 36.191 38.582 
20 7434 8.260 9.591 10.851 12443 28412 31410 34.170 37.566 39.997 
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35479 38.932 41.401 
22 8.643 9542 10.982 12.338 14.042 30.813 33.924 36.781 40.289 42.796 
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 
25 10.520 11.524 13.120 14611 16.473 34382 37.652 40.646 44.314 46.928 
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 
27 11.808 12.879 14573 16.151 18.114 36.741 40.113 43.194 46.963 49.645 
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44461 48.278 50.993 
29 13.121 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 
30 13.787. 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 
60 35.534 37485 40482 43.188 46.459 74397 79.082 83.298 88.379 91.952 
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169 


D. B. Owen, HANDBOOK OF STATISTICAL TABLES, A.5, Published by Addison Wesley Longman, 
Inc. Reproduced by permission of Pearson Education, Inc. All rights reserved. 
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F 
Table 7— F-Distribution F 
d.f.p: @ = 0.005 
pane of d.f.y: Degrees of freedom, numerator 
reedom, 
denominator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 
1 16211 20000 21615 22500 23056 23437 23715 23925 24091 24224 24426 24630 24836 24940 25044 25148 25253 25359 25465 
2 198.5 199.0 199.2 199.2 199.3 199.3 1994 1994 1994 1994 1994 1994 1994 199.5 199.5 199.5 199.5 199.5 199.5 
3 55.55 49.80 4747 46.19 45.39 4484 4443 4413 43.88 43.69 43.39 43.08 42.78 4262 4247 42.31 42.15 41.99 41.83 
4 31.33 26.28 2426 23.15 2246 21.97 21.62 21.35 21.14 2097 2070 2044 20:17 20.03 19.89 1975 19.61 19.47 19.32 
| 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 13.77 13.62 13.38 13.15 12.90 12.78 12.66 12.53 12.40 12.27 12.14 
6 18.63 14.54 12.92 12.03 TIAGO?” O79 1057 (039 10.25 10.03 9.81 9.59 9.47 9.36 9.24 Bile! 9.00 8.88 
7 16.24 1240 1088 10.05 9.52 9.16 8.89 8.68 8.51 8.38 8.18 7.97 TIS 7.65 7.53 742 731 7.19 7.08 
8 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 7.21 7.01 6.81 6.61 6.50 6.40 6.29 6.18 6.06 5.95 
9 13.61 10.11 8.72 7.96 7A7 743 6.88 6.69 6.54 6.42 6.23 6.03 5.83 5.73 5.62 5.52 5.41 5.30 5.19 
10 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 5.85 5.66 5.47 Soi nl Wa 5.07 4.97 4.86 4.75 4.64 
11 12.73 8.91 7.60 6.88 6.42 6.10 5.86 5.68 5.54 5.42 5.24 5.05 4.86 4.76 4.65 4.55 444 4.34 4.23 
12 ee 8.51 23 6:52 6.07 5.76 5.52 535 5.20 5.09. 4.91 4.72 4.53 4.43 4.33 4.23 4.12 4.01 3.90 
13 11.37 8.19 6.93 6.23 5.79 5.48 5:25 5.08 4.94 4.82 4.64 446 4.27 4.17 4.07 3.970 3,87 3.76 3.65 
14 11.06 7,92 6.68 6.00 5.56 5.26 5.03 4.86 4.72 4.60 443 4.25 4.06 3.96 3.86 3.70 3.66 3,55 3.44 
15 10.80 7I0 6.48 5.80 5.37 5.07 4.85 4.67 4.54 442 4.25 4.07 3.88 3.79 3.69 3.58 3.48 3,37 3.26 
16 10.58 Jo 6.30 5.64 SA 4.91 4.69 4.52 4.38 4.27 4.10 3.92 373 3.64 3.54 3.48. 3.33 a 32) 
17 10.38 7.35 6.16 5.50 5.07 4.78 4.56 4.39 4.25 4.14 3.97 3.79 3.61 3.51 3.41 32K. 3.21 3.10 2.98 
18 122 721 6.03 Sree 4.96 4.66 444 4.28 4.14 4.03 3.86 3.68 3.50 3.40 3.30 3.265 2.10 2.99 Zo 
19 10.07 7.09 5.92 5.27 4.85 4.56 4.34 4.18 4.04 3.93 3.76 3.59 3.40 3.31 3.21 3.1% 3.00 2.89 2.78 
20 9.94 6.99 5.82 ells 4.76 447 4.26 4.09 3.96 3.85 3.68 3,50 3.32 3.22 3.12 3.02 2.92 2.81 2.69 
21 9.83 6.89 573 5.09 4.68 4.39 4.18 4.01 3.88 SIT 3.60 3.43 3.24 3.15 3.05. 25 2.84 273 2.61 
22 9.73 6.81 5:65) 5.02 4.61 4.32 4.11 3.94 3.81 aie] 3.54 a6 3.18 3.08 2.98 2.86 207 2.66 253) 
23 9.63 6.73 5.58 4.95 4.54 4.26 4.05 3.88 3.75 3.64 3.47 3.30 3.12 3.02 2.92 2.82 2.71 2.60 248 
24 9.55 6.66 552 4.89 449 4.20 3.99 3.83 3.69 3.59 3.42 B25 3.06 2.97 2.87 2 2.66 2.55 2.43 
25 948 6.60 5.46 4.84 443 4.15 3.94 3.78 3.64 3.54 3.37 3.20 3.01 2,92 2.82 27E: 2.61 2.50 2.38 
26 941 6.54 5.41 4.79 4.38 4.10 3.89 3,73 3.60 3.49 3:35) 3,15 2.97 2.87 277 7 2.56 2.45 2.33 
27 9.34 6.49 5.36 4.74 4.34 4.06 3.85 3.69 3.56 3.45 3.28 3.11 2.93 2.83 2/3 2.635 2.52 241 2.25 
28 9.28 6.44 Seep) 4.70 4.30 4.02 3.81 3.65 352 3.41 3.25 3.07 2.89 279) 2.69 2.59; 248 227 2.29 
29 9.23 6.40 5.28 4.66 4.26 3.98 3.77 3.61 3.48 3.38 3.21 3.04 2.86 2.76 2.66 2.26 245 2.33 2.24 
30 9.18 635 5.24 4.62 4.23 3.95 3.74 3.58 3.45 3.34 3.18 3.01 2.82 203 2.63 Ze 242 230 218 
40 8.83 6.07 4.98 4.37 3.99 3.71 351 3.35 3.22 3.12 2.95 2.78 2.60 2.50 2.40 2.30 2.18 2.06 1.93 
60 8.49 579 4.73 4.14 3.76 3.49 3.29 eal fe) 3.01 2.90 2.74 257 2.39 2.29 2.19 2.08 1.96 1.83 1.69 
120 8.18 5.54 4.50 3.92 355 3.28 3.09 2,93 2.81 271 2.54 237 2.19 2.09 1.98 1.87 175 1.61 1.43 
ee) 7.88 5.30 4.28 Sine 3.35 3.09 2.90 2.74 2.62 22 2.36 2.19 2.00 1.90 1.79 1.67 1:53 136 1.00 
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Table 7— F-Distribution (continued) 


bio Ayesgilayel//:sayy :Aq pejusseald 


d.f.p: a = 0.01 
pase of d.f.y: Degrees of freedom, numerator 
reedom, 
denominator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 
1 4052 4999.5 5403 5625 5764 5859 5928 5982 6022 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366 
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 9940 9942 9943 9945 9946 9947 9947 9948 99.49 99.50 
3 34.12 3082 2946 28.71 28.24 27.91 27.67 2749 27.35 27.23 27.05 26.87 2669 2660 26.50 26.41 26.32 26.22 26.13 
4 2120S: 00669 5:98 lib 2ela:21 14.98 1480 14.66 14.55 4.37 1420 1402 13.93 13.84 13.75 13.65 13.56 13.46 
5 16.26 13.27. 12.06 11.39 10.97 1067 1046 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02 
6 1St7oeLOLO2: 9.78 ONS) 8.75 8.47 8.26 8.10 7.98 7.87 UP 7.56 7.40 esi 723) 7.14 7.06 6.97 6.88 
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65 
8 11.26 8.65 Tess} 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 Sey 5.36 5.28 5.20 all 5.03 4.95 4.86 
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31 
10 10.04 7.56 6.55 3.99 5.64 Syst) 5.20 5.06 4.94 4.85 471 4.56 441 4.33 4.25 4.17 4.08 4.00 3.91 
11 9.65 7.21 6.22 5.67 3:32 5.07 4.89 4.74 4.63 4.54 4.40 4.25 4.10 4.02 3.94 3.86) 3.78 3.69 3.60 
22 2) 5}3) 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36 
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.66 3.59 351 3.48 3.34 3.25 3.17 
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.80 3.66 Bt) 3.43 335 3.22 3.18 3.09 3.00 
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3:37 3.29 3.21 3B 3.05 2.96 2.87 
16 8.53 6.23 p29) 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.Q2> 2.93 2.84 27D 
17 8.40 6.11 5.18 4.67 4.34 4.10 3:93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65 
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 Bil 3.60 3.51 shay/ B23 3.08 3.00 2.92. 2.84, 275) 2.66 Psy] 
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.30 3.15 3.00 2.92 2.84 276 2.67 2.58 2.49 
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 345)7/ 3:23 3.09 2.94 2.86 2.78 2.68 2.61 2.52 2.42 
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3:31 3.17 3.03 2.88 2.80 2.72 2.65 255 2.46 2.36 
22 7.95 By? 4.82 4.31 3:99 3.76 B59 3.45 3335) 3.26 S512 2.98 2.83 272) 2.67 2.58 2.50 2.40 2.31 
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54: 2.45 2.35 2.26 
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 Syil7/ 3.03 2.89 2.74 2.66 2.58 a 2.40 Desi 22 
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13. 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17 
26 U-j/?2 Sys) 4.64 4.14 3.82 3.59 3.42 3:29 3.18 3.09 2.96 2.81 2.66 2.58 2.50 2.42 233 223 2213 
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3:15 3.06 2.93 2.78 2.63 2:55 2.47 2.38 2.29 2.20 2.10 
28 7.64 5.45 4.57 4.07 eps) Bio 3.36 3.23 sh) 3.03 2.90 2.75 2.60 PLY) 2.44 235) 2.26 ali, 2.06 
29 7.60 5.42 4.54 4.04 3.73: 3.50 3:33 3.20 3.09 3.00 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.03 
30 7.56 Set) 4.51 4.02 3.70 3.47 3.30 Syil7/ 3.07 2.98 2.84 2.70 25) 2.47 2.39 2.30 edi| all 2.01 
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.80 
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 Pref? 2.63 2.50 2.35 2.20 2.12 2.03 1.94 1.84 les} 1.60 
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2:19 2.03 1.95 1.86 1.76 1.66 1.53 1.38 
co 6.63 4.61 3.78 BZ 3.02 2.80 2.64 2.51 241 2.32 2.18 2.04 1.88 E79) 1.70 ies) 1.47 p32) 1.00 
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Table 7— F-Distribution (continued) 


c7V 


@ XIGNd4ddV 


d.f.p: a = 0.025 
Degrees of d.f.y: Degrees of freedom, numerator 
freedom, 
denominator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 
1 647.8 799.5 864.2 899.6 921.8 937.1 948.2 956.7 963.3 968.6 976.7 9849 993.1 997.2 1001 1006 1010 1014 1018 


2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 3940 3941 3943 3945 3946 3946 3947 3948 39.49 39.50 
3 17.44 1604 1544 15.10 1488 14.73 1462 1454 1447 1442 1434 14.25 14.17 1412 1408 1404 13.99 13.95 13.90 
4 1222) 10:65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.84 8.75 8.66 8.56 8.51 8.46 8.41 8.36 8.31 8.26 
5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.62 6.52 6.43 6.33 6.28 6.23 6.18 6.12 6.07 6.02 
6 8.81 7.26 6.60 6.23 Sek) 5.82 5.70 5.60 Sey 5.46 Doe D27, Spd D2 5.07 5.01 4.96 4.90 4.85 
7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 4.76 4.67 4.57 447 442 4.36 4.31 4.25 4.20 4.14 
8 Test) 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36 4.30 4.20 4.10 4.00 3S)5) 3.89 3.84 3.78 3.73 3.67 


9 7.21 5/1 5.08 4.72 448 4.32 4.20 4.10 4.03 3.96 3.87 3.77 3.67 3.61 3.56 3.51 3.45 3.39 3.33 
10 6.94 5.46 4.83 4.47 4.24 4.07 3.95) 3.85 3.78 3.72 3.62 3.52 3.42 337, 331 3.26 3.20 3.14 3.08 
11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 3.53 3.43 3.33 3.23 3.17 3.12 3.06 3.00 2.94 2.88 
12 6.55 5.10 447 4.12 3.89 3.73 3.61 3.51 3.44 3:37, 3.28 3.18 3.07 3.02 2.96 2.91 2.85 PIfe) 2.72 
13 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31 3.25 3.15 3.05 2.95 2.89 2.84 2.7) 2.72 2.66 2.60 
14 6.30 4.86 4.24 3\ss)2) 3.66 3.50 3.38 328) 37 SIS) 3.05 2.95 2.84 AIS) 2.73 2. 2.61 ASS) 2.49 
15 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12 3.06 2.98 2.86 2.76 2.70 2.64 2.59 2.52 2.46 2.40 
16 6.12 4.69 4.08 3.73 3.50 3.34 322) 32) 3.05 2.99 2.89 ZI) 2.68 2.63 Psy) 2.5 2.45 2.38 232 
17 6.04 4.62 4.01 3.66 3.44 3.28 3.16 3.06 2.98 2.92 2.82 2.72 2.62 2.56 2.50 244 2.38 2.32 2.25 
18 5.98 4.56 3,95) 3.61 3.38 322) 3.10 3.01 2.93 2.87 2.77 2.67 2.56 2.50 2.44 Peep PysVvs 2.26 PLANS) 
19 5.92 451 3.90 3.56 3.33 3.17 3.05 2.96 2.88 2.82 2.72 2.62 2.51 2.45 2.39 2.3 2.27 2.20 2.13 
20 5.87 4.46 3.86 Bil 3X9) Shlls} 3.01 2.91 2.84 AST) 2.68 2.57 2.46 2.41 P3)5) 2.28. 222) 2.16 2.09 
21 5.83 442 3.82 3.48 3.25 3.09 2.97 2.87 2.80 2.73 2.64 2.53 2.42 2.37 2.31 2.25 2.18 2.11 2.04 
22 pI) 4.38 3.78 3.44 322) 3.05 293) 2.84 2.76 2.70 2.60 2.50 2.39 233) 227, 2at 2.14 2.08 2.00 
23 5.75 4.35 3./5 3.41 3.18 3.02 2.90 2.81 2.73 2.67 2.57 2.47 2.36 2.30 2.24 2 2.11 2.04 1.97 
24 S/d 4.32 3.72 3.38 S15 299) 2.87 2.78 2.70 2.64 2.54 2.44 233) 22, 221 2 2.08 2.01 1.94 
25 5.69 4.29 3.69 3.35 3.13 2.97 2.85 2.75 2.68 2.61 251 2.41 2.30 2.24 2.18 2. 2.05 1.98 1.91 
26 5.66 4.27 3.67 5}315) 3.10 2.94 2.82 AIS 2.65 ASE) 2.49 2.39 Ps) Dap 2.16 2.08 2.03 1.95 1.88 
27 5.63 4.24 3.65 3.31 3.08 2.92 2.80 2.71 2.63 2.57 2.47 2.36 2.25 2.19 2.13 2.2 2.00 1.93 1.85 
28 5.61 4.22 3.63 3.29 3.06 2.90 2.78 2.69 2.61 2.55 2.45 2.34 2.23 2.17 2.11 2.05 1.98 124 1.83 
29 5.59 4.20 3.61 3.27 3.04 2.88 2.76 2.67 2.59 2.53 2.43 2.32 2.21 2:15 2.09 2.038 1.96 1.89 1.81 
30 yey) 4.18 S519) Be) 3.03 2.87 275 2.65 Doh 2.51 2.41 2.31 2.20 2.14 2.07 2.01 1.94 1.87 Iw) 
40 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.45 2.39 2.29 2.18 2.07 2.01 1.94 1.88 1.80 1.72 1.64 
60 S49) B38 3.34 3.01 BINS) 2.63 P25) 2.41 P3)3) A22T) PAT) 2.06 1.94 1.88 1.82 1.74 1.67 1.58 1.48 

120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 2.16 2.05 1.94 1.82 1.76 1.69 1.61 1.53 1.43 1.31 


co 5.02 3.69 SZ 2.79 Py) 2.41 2.29 PAN) Ali 2.05 1.94 1.83 1.71 1.64 esv/ 1.48 39 27, 1.00 
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Table 7— F-Distribution (continued) 


d.f.p: a = 0.05 
ee of d.f.y: Degrees of freedom, numerator 
reedom, 
denominator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 
1 161.4 199.5 215.7 2246 230.2 2340 2368 238.9 2405 241.9 243.9 2459 248.0 249.1 250.1 2511 252.2 253.3 2543 
2 18.51 1900 OSG 19:25) 19'S 01933599 11935. 1937 19385 1940 9A 1943 1945 1945 1946 1947 19.48 1949 19.50 
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53 
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 Syi/7/ Byils) Sy 5.69 5.66 5.63 
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36 
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 Bi7/ 3.74 3.70 3.67 
7 5:59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.07 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23 
8 Dee 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3339 B35) 3.28 3.22 Sei) Ball2. 3.08 3.04 3.01 2.97 2:93) 
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71 
10 4.96 4.10 Sh 3.48 3.33 eZ. 3.14 3.07 3.02 2.98 2.91 2.85 PHT 2.74 2.70 2.66 2.62 2.58 2.54 
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.59 2.49 2.45 2.40 
12 4.75 3.89 3.49 3.26 Salil 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.48 2.38 2.34 2.30 
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.93 2.46 2.42 2.38 248 2.30 2.25 2.21 
14 4.60 3.74 3.34 Ball 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 239 235 2.31 2.20. 2.22 2.18 2513 
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.29 2.16 2.11 2.07 
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 259 2.54 2.49 2.42 2.35) 2.28 2.24 219 2.16- all 2.06 2.01 
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.40 2.06 2.01 1.96 
18 441 Sho) 3.16 2.93 P2eHtT] 2.66 2.58 25) 2.46 241 2.34 2a), AM) 2.15 Poll] 2.06- 2.02 1.97 1.92 
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88 
20 4.35 3.49 3.10 2.87 Pest | 2.60 211 2.45 2239 2.35 2.28 2.20 7a {\2 2.08 2.04 1.99 1:95 1.90 1.84 
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2:25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81 
22, 4.30 3.44 3.05 2.82 2.66 Paps)5) 2.46 2.40 2.34 2.30 223 Dal 2.07 2.03 1.98 1S, 1.89 1.84 1.78 
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.01 1.96 1.4& 1.86 1.81 1.76 
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 Ppl) 2.18 2.11 2.03 1.98 1.94 1 ce 1.84 lw) 175) 
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1. 1.82 1.77 1.71 
26 4.23 34:5)// 2.98 2.74 2.59 2.47 2339 232. 2.27 2.22 2515 2.07 1:99 1.95 1.90 1.8 1.80 1.75 1.69 
27 4.21 3.35 2.96 2.73 2:57 2.46 2.37 2.31 2.25 2.20 2.13 2.06 1.97 1.93 1.88 1. 1.79 1.73 1.67 
28 4.20 3.34 2.95 Pdf) 2.56 2.45 2.36 2.29 2.24 29 AAA\?2 2.04 1.96 191 1.87 1.82 ea/7/ e7al 1.65 
29 4.18 3.33 2.93 2.70 2:55. 2.43 2.39 2.28 2.22 2.18 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64 
30 4.17 3:32 2.92 2.69 2.53 2.42 2.33 ell] 22 2.16 2.09 2.01 1:93 1.89 1.84 E79 1.74 1.68 1.62 
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51 
60 4.00 3515 2.76 2.53 72g] 2.25 PaelZ/ ZO 2.04 1.99 1.92 1.84 LS 1.70 1.65 159 1.53 1.47 39 
120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91 1.83 1:75 1.66 1.61 1.55 1.50 1.43 1.35 1.25 
2) 3.84 3.00 2.60 PsVi/ 2.21 2.10 2.01 1.94 1.88 1.83 EZ 1.67 L7/ ps2 1.46 1.39 32) 2 1.00 
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Table 7— F-Distribution (continued) 


d.f.p: a = 0.10 
pia of d.f.y: Degrees of freedom, numerator 
reedom, 
denominator 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 

1 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.33 
2 8.53 9.00 9.16 9.24 O29 O33 935 O37 9.38 039) 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49 
3 5.54 5.46 5:39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.20 5.18 5.18 5.17 5.16 9.15 5.14 5.13 
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 8195 3.94 3:92 3.90 3.87 3.84 3.83 3.82 3.80 379 3.78 3.76 
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.10 
6 3.78 3.46 3729 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72 
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 272 2.70 2.67 2.63 2.59 2.58 2.56 2.54 2:51 2.49 2.47 
8 3.46 Ball 292. 2.81 23) 2.67 2.62 259) 2.56 2.54 2.50 2.46 2.42 2.40 2.38 2.36 2.34 32. 2.29 
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16 
10 3.29 2.92 275 2.61 252. 2.46 2.41 2.38 235 232 2.28 2.24 2.20 2.18 2.16 2.13 Bali 2.08 2.06 
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25 2.21 27 2.12 2.10 2.08 2.05 2.03 2.00 1.97 
2) 3.18 2.81 2.61 2.48 2:39 P2es}3} 2.28 2.24 2.21 PI) 2a 2.10 2.06 2.04 2.01 1-99) 1.96 1°93 1.90 
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14 2.10 2.05 2.01 1.98 1.96 1.93) 1.90 1.88 1.85 
14 3.10 Ait} apy? 2.39 2.31 2.24 AA) AIS) PDA \72 2.10 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80 
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 2.02 1.97 1.92 1.90 1.87 1 & 1.82 1.79 1.76 
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 199 1.94 1.89 1.87 1.84 1.88. 1.78 1.75 eZ 
17 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00 1.96 1.91 1.86 1.84 1.81 1 BR 1.75 1.72 1.69 
18 3.01 2.62 2.42 2.29 2.20 2a 2.08 2.04 2.00 1.98 93 1.89 1.84 1.81 1.78 Wey eZ. 1.69 1.66 
19 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63 
20 297 2.59 2.38 Pde p'S) 2al6 2.09 2.04 2.00 1.96 1.94 1.89 1.84 I) a7 1.74 1a 1.68 1.64 1.61 
21 2.96 2:57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92 1.87 1.83 1.78 175 1.72 1 Rock 1.66 1.62 1.59 
22 295 2.56 235 2.22 2513 2.06 2.01 ey 1.93 1.90 1.86 1.81 1.76 les} 1.70 1 Sh 1.64 1.60 L37/ 
23 2.94 2.59 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89 1.84 1.80 1.74 1.72 1.69 1.66 1.62 1:59 1.55 
24 2.93 2.54 233) P22) 2.10 2.04 1.98 1.94 1) 1.88 1.83 1.78 173} 1.70 1.67 ‘S 1.61 les7/ i333} 
25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87 1.82 1.77 1.72 1.69 1.66 1.63: 1.59 1.56 1.52 
26 2.91 Pagsy 2.31 Ads\7/ 2.08 2.01 1.96 1.92 1.88 1.86 1.81 1.76 Heal 1.68 1.65 ie 1.58 1.54 1.50 
27 2.90 2.51 2.30 2:17 2.07 2.00 1.95 1.91 1.87 1.85 1.80 1,75 1.70 1.67 1.64 1 60 1.57 1.53 1.49 
28 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84 Iw) 1.74 1.69 1.66 1.63 1.39, 1.56 ey 1.48 
29 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83 1.78 1,73 1.68 1.65 1.62 1.398 1.55 1.51 1.47 
30 2.88 2.49 2.28 2.14 2.05 1.98 In93 1.88 1.85 1.82 Teal 172} 1.67 1.64 1.61 ies7/ 1.54 1.50 1.46 
40 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76 1.71 1.66 1.61 157 1.54 1.51 1.47 1.42 1.38 
60 2.79 239 2.18 2.04 1-25 1.87 1.82 a7 1.74 le 7al 1.66 1.60 1.54 1.51 1.48 1.44 1.40 1.35 129) 
120 2.75 2:35 2:13 1.99 1.90 1.82 1.77 1.72 1.68 1.65 1.60 1355 1.48 1.45 1.41 1.37 1.32 1.26 1.19 
co Pelt | 2.30 2.08 1.94 1.85 lew7/ 72 1.67 1.63 1.60 135 1.49 1.42 1.38 1.34 1.30 1.24 lel 1.00 


From M. Merrington and C.M. THompson, “Table of Percentage Points of the Inverted Beta (F) Distribution”, 


Biometrika 33 (1943), pp. 74-87, by permission of Oxford University Press. 
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APPENDIX B_ TABLES 8 AND 9—CRITICAL VALUES FOR THE SIGN TEST AND THE WILCOXON SIGNED-RANK TEST A25 


Table 8— Critical Values for the Sign Test 


Reject the null hypothesis if the test statistic is less than or equal to the value in the table. 


One-tailed, 
a = 0.005 a 


0.01 a@ = 0.025 a 0.05 


Two-tailed, 
n a = 0.01 a = 0.02 a = 0.05 a 


0.10 


Note: Table 8 is for one-tailed or two-tailed tests. 
The sample size n represents the total number 
of + and — signs. The test value is the smaller 
number of + or — signs. 


From Journal of American Statistical Association Vol. 41 
(1946), pp. 557-66. W. J. Dixon and A. M. Mood. 
Reprinted with permission. 


N 
UUBRRBRWWWNNNHH|H=DC0O 
Nuun BHRBPWWNYNYNYNY HS KH KH OCHO 
DNDAUUUDAARWWWNNABHABO 
NNN DDUUNUNBHRWWWNYD HH HH HS 


Table 9— Critical Values for the Wilcoxon Signed-Rank Test 


Reject the null hypothesis if the value of the test statistic w, is less than or equal to the value given in the table. 


One-tailed, 
a = 0.05 a@ = 0.025 a = 0.01 a = 0.005 


Two-tailed, 


n a = 0.10 a = 0.05 a = 0.02 a = 0.01 

5 1 = = — 

6 2 1 = = 

7 4 2 0 = 

8 6 4 2 0 

9 8 6 3 2 

10 11 8 5 3 

11 14 11 7 5 

12 17 14 10 7 

13 21 17 13 10 

14 26 21 16 13 

15 30 25 20 16 

16 36 30 24 19) 

17 41 35 28 23 

18 47 40 33 28 

19 54 46 38 32 
20 60 52 43 37 
21 68 59 49 43 
22 5 66 56 49 
23 83 73 62 55 
24 92 81 69 61 
25 101 90 77 68 
26 110 98 85 76 From Some Rapid Approximate Statistical Procedures. 
27 120 107 93 84 Copyright 1949, 1964 Lederle Laboratories, American 
28 130 117 102 92 Cyanamid Co., Wayne, N.J. Reprinted with permission. 
29 141 127 111 100 
30 152 137 120 109 
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A26 APPENDIX B TABLES 10 AND 11—CRITICAL VALUES FOR THE SPEARMAN RANK AND PEARSON 


Table 10 — Critical Values for the Table 11— Critical Values for the 
Spearman Rank Correlation Pearson Correlation Coefficient 
Reject Ho: p, = 0 if the absolute value of r, is greater than Reject Hy: p = 0 if the absolute value of r is 
the value given in the table. greater than the value given in the table. 
n a@a=0.10 a@=0.05 a= 0.01 n a@a=005 a= 0.01 
5 0.900 — — 4 0.950 0.990 
6 0.829 0.886 — 5 0.878 0.959 
7 0.714 0.786 0.929 6 0.811 0.917 
8 0.643 0.738 0.881 7 0.754 0.875 
9 0.600 0.700 0.833 8 0.707 0.834 
10 0.564 0.648 0.794 9 0.666 0.798 
11 0.536 0.618 0.818 10 0.632 0.765 
12 0.497 0.591 0.780 11 0.602 0.735 
13 0.475 0.566 0.745 12 0.576 0.708 
14 0.457 0.545 0.716 13 0.553 0.684 
15 0.441 03525 0.689 14 0.532 0.661 
16 0.425 0.507 0.666 115 0.514 0.641 
17 0.412 0.490 0.645 16 0.497 0.623 
18 0.399 0.476 0.625 17 0.482 0.606 
19 0.388 0.462 0.608 18 0.468 0.590 
20 0.377 0.450 0.591 19 0.456 0.575 
21 0.368 0.438 0.576 20 0.444 0.561 
22 0.359 0.428 0.562 21 0.433 0.549 
23 0.351 0.418 0.549 22 0.423 0.537 
24 0.343 0.409 0.537 23) 0.413 0.526 
25 0.336 0.400 0.526 24 0.404 0.515 
26 0.329 0.392 0.515 25 0.396 0.505 
27 0.323 0.385 0.505 26 0.388 0.496 
28 0.317 0.377 0.496 27 0.381 0.487 
29 0.311 0.370 0.487 28 0.374 0.479 
30 0.305 0.364 0.478 29 0.367 0.471 
30 0.361 0.463 
Reprinted with permission from the Institute of ™ — _ 
Mathematical Statistics. : : 
45 0.294 0.380 
50 0.279 0.361 
55) 0.266 0.345 
60 0.254 0.330 
65 0.244 0.317 
70 0.235 0.306 
75 0.227 0.296 
80 0.220 0.286 
85 0.213 0.278 
90 0.207 0.270 
95 0.202 0.263 
100 0.197 0.256 


The critical values in Table 11 were generated 
using Excel. 
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TABLE 12—CRITICAL VALUES FOR THE NUMBER OF RUNS 


APPENDIX B 


| Values for the Number of Runs 


itica 


Cr 


Table 12 


Reject the null hypothesis if the test statistic G is less than or equal to the smaller 


entry or greater than or equal to the larger entry. 


Value of n, 


10 


10 


12 


12 


14 


14 


NOAA -ONOMH—- T 
N CA aa — CN eae 
Ono) — agai OC ™ eae 
N N Res <— CN ie) 
ANNOA-ONOM 
= N Gy <— CN ian 
ANNOA-ONOM 
— N NK N-N 
ONnNADAOA-A—-ON 
= N N NC eae 
NWADAHAWDODOA—-AVIN 
= - N N N 


-O- ONDONON WYO 


12 


11 


13 
26 


12 
26 


13 
27 


13 
26 


NONONONONUON YO 


o = N (a2) vt 
= = = = = 


Te) 
= 


o 
= 


Nn 
= 


(2) 
= 


a 
= 


So 
N 


‘u yo anjen 


Note: Table 12 is for a two-tailed test with a = 0.05. 


Reprinted with permission from the Institute of Mathematical Statistics. 
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APPENDIX 


C Normal Probability Plots and Their Graphs 


WHAT YOU SHOULD LEARN 


» How to construct and interpret 
a normal probability plot 


INSIGHT 


A normal probability 
plot is also called a 
normal quantile plot. 


Normal Probability Plots 
>» NORMAL PROBABILITY PLOTS 


For the majority of problems throughout this book, it has been assumed that 
a random sample of data is selected from a population that has a normal 
distribution. Suppose you select a random sample from a population with an 
unknown distribution. How can you determine if the sample was selected from 
a population that has a normal distribution? 

You have already learned that a histogram or stem-and-leaf plot can reveal the 
shape of a distribution and any outliers, clusters, or gaps in a distribution. These 
data displays are useful for assessing large sets of data, but assessing small data 
sets in this manner can be difficult and unreliable. A reliable method for assessing 
normality in small data sets is to use a graph called a normal probability plot. 


DEFINITION 


A normal probability plot is a graph that plots each observed value from 
the data set along with its corresponding z-score. The observed values are 
usually plotted along the horizontal axis while the corresponding z-scores are 
plotted along the vertical axis. 


If the plotted points in a normal probability plot are approximately linear, then 
you can conclude that the data come from a normal distribution. If the plotted 
points are not approximately linear or follow some type of pattern that is not 
linear, you can conclude that the data come from a distribution that is not 
normal. When examining a normal probability plot, look for deviations or 
clusters of points that stray from the line, which indicate a distribution that is 
not normal. Individual points that stray from the line in a normal probability 
plot may be outliers. 

Constructing a normal probability plot by hand can be rather tedious. 
Technology tools such as MINITAB or a TI-83/84 Plus can be used to construct 
normal probability plots, as shown in Example 1. 


EXAMPLE 1 


> Constructing a Normal Probability Plot 


The heights (in inches) of 12 current National Basketball Association players 
are listed. Use a technology tool to construct a normal probability plot to 
determine if the data come from a population that has a normal distribution. 
Identify any possible outliers. 


74, 69, 78, 75, 73, 71, 80, 82, 81, 76, 86, 77 
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STUDY TIP 


Here are instructions for 
constructing a normal probability 
plot using MINITAB. First, enter 
the heights into column C1. Then 
click Graph, and select Probability 
Plot. Make sure single probability 
plot is chosen and click OK. Then 
double-click C1 to select the data 
to be graphed. Click Distribution 
and make sure normal is chosen. 
Click the Data Display menu. 
Select Symbols only. Then click OK. 
Click Scale and in the Y-Scale Type 
menu, select Score, and click OK. 
Click Labels and title the graph. 
Then click OK twice. 


Normal Probability Plot of Player Heights 


Normal 


Score 
Oo 
1 
+ 
E 
f 


T 
LOO T7215 Ol Lom COO o2.ommoo.O Gro, 
Player heights 
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> Solution 


Using a TI-83/84 Plus, begin by entering the data into List 1. Then use Stat Plot 
to construct the normal probability plot. The plot should look similar to the 
one shown below. From the scatter plot, it appears that there are no outliers 
and the points are approximately linear. To construct a normal probability plot 
using MINITAB, follow the instructions in the margin. 


TI-83/84 PLUS 


TI-83/84 PLUS 


Normal probability plot 


Interpretation Because the points are approximately linear, you can 
conclude that the sample data come from a population that has a normal 
distribution. 


> Try It Yourself 1 


The balances (in dollars) on student loans for 18 randomly selected college 
seniors are listed. 


29,150 16,980 12,470 19,235 15,875 8,960 
16,105 14,575 39,860 20,170 9,710 19,650 
21,590 8,200 18,100 25,530 9,285 10,075 


a. Use a technology tool to construct a normal probability plot. Are the points 
approximately linear? 

b. Identify any possible outliers. 

c. Interpret your answer. Answer: Page A49 


To see that the points are approximately linear, you can graph the regression line 
for the original data values and their corresponding z-scores. The regression line 
for the heights and z-scores from Example 1 is shown in the graph. From the 
graph, you can see that the points lie along the regression line. You can also 
approximate the mean of the data set by determining where the line crosses the 
X-axis. 


TI-83/84 PLUS 
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Try It Yourself Answers 


CHAPTER 1 


Section 1.1 


la. The population consists of the prices per gallon of regular 
gasoline at all gasoline stations in the United States. The 
sample consists of the prices per gallon of regular gasoline 
at the 900 surveyed stations. 


b. The data set consists of the 900 prices. 
2a. Population b. Parameter 


3a. Descriptive statistics involve the statement “76% of 
women and 60% of men had a physical examination 
within the previous year.” 


b. An inference drawn from the study is that a higher 
percentage of women had a physical examination 
within the previous year. 


Section 1.2 


la. City names and city populations 


b. City names: Nonnumerical 
City populations: Numerical 


c. City names: Qualitative 
City populations: Quantitative 


2a. (1) The final standings represent a ranking of basketball 
teams. 


(2) The collection of phone numbers represents labels. 
b. (1) Ordinal, because the data can be put in order. 


(2) Nominal, because no mathematical computations can 
be made. 


3a. (1) The data set is the collection of body temperatures. 
(2) The data set is the collection of heart rates. 


b. (1) Interval, because the data can be ordered and 
meaningful differences can be calculated, but it 
does not make sense to write a ratio using the 
temperatures. 


(2) Ratio, because the data can be ordered, meaningful 
differences can be calculated, the data can be 
written as a ratio, and the data set contains an 
inherent zero. 


Section 1.3 


la. (1) Focus: Effect of exercise on relieving depression 


(2) Focus: Success rates of graduates of a large university 
in finding a job within one year of graduation 


b. (1) Population: Collection of all people with depression 


(2) Population: The employment status of all graduates of 
a large university one year after graduation 


c. (1) Experiment 
(2) Survey 


A30 


2a. There is no way to tell why the people quit smoking. 
They could have quit smoking as a result of either chewing 
the gum or watching the DVD. 


b. Two experiments could be done; one using the gum and 
the other using the DVD. 


3a. Answers will vary. Sample answer: Start with the first 
digits 92630782 .... 


b. 92|63|07|82|40|19|26 
c. 63, 7, 40, 19, 26 
4a. Sample selection: 


(1) The sample was selected by using the students in a 
randomly chosen class. 


(2) The sample was selected by numbering each student 
in the school, randomly choosing a starting number, 
and selecting students at regular intervals from the 
starting number. 


Sampling technique: 
(1) Cluster sampling 
(2) Systematic sampling 
b. (1) The sample may be biased because some classes may 


be more familiar with stem cell research than other 
classes and have stronger opinions. 


(2) The sample may be biased if there is any regularly 
occurring pattern in the data. 


CHAPTER 2 


Section 2.1 


la. 8 classes 


b. Min = 35; Max = 89; Class width = 7 
* | Lower limit Upper limit 

35 41 

42 48 

49 55 

56 62 

63 69 

70 76 

77 83 

84 90 


d. See part (e). 


e. 


Class Frequency, f 


35-41 2 
42-48 5 
49-55 
56-62 
63-69 1 
70-76 
71-83 
84-90 


DRnmemMoOoNnNA 
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2a. 
b. 


c. 


d. 
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See part (b). 


Frequency, Relative Cumulative 
Class f Midpoint | frequency frequency 
35-41 2 38 0.04 2 
42-48 5 45 0.10 7 
49-55 7 52 0.14 14 
56-62 7 59 0.14 21 
63-69 10 66 0.20 31 
70-76 5 73 0.10 36 
T1-83 8 80 0.16 44 
84-90 6 87 0.12 50 

xf = 50 x - =1 

n 


The most common age bracket for the 50 richest people is 
63-69. 72% of the 50 richest people are older than 55. 
4% of the 50 richest people are younger than 42. 
(Answers will vary.) 


Class boundaries 


34.5—41.5 
41.5—48.5 
48.5-55.5 
55.5-62.5 
62.5-69.5 
69.5-76.5 
76.5—83.5 
83.5-90.5 


. Use class midpoints for the horizontal scale and frequency 


for the vertical scale. (Class boundaries can also be used 
for the horizontal scale.) 


Ages of the 50 
Richest People 


o 


Frequency 
Nn & D ow 


. Same as 2(c). 
. Same as 3(b). 
. See part (c). 


Ages of the 50 
Richest People 


Frequency 
Nu kD ow S 
pop yy 


The frequency of ages increases up to 66 and then 
decreases. 


5 abe. Ages of the 50 
Richest People 
rN 
30.24 
§ 0.204 
3 0.16 + 
& 
5 0.12 
& 0.08+ 
Ss 
@ 0.04 
a nnnnaans 
tTomaNnaIdawuonse 
nantrrTNOoOUOnm AN 
Age 
6 a. Use upper class boundaries for the horizontal scale and 


cumulative frequency for the vertical scale. 


. See part (c). 
Ages of the 50 
Richest People 
50+ 
Z R404 
©: al 


i=) 
1 


. Approximately 40 of the 50 richest people are 80 years old 


or younger. 


e. Answers will vary. 


. Enter data. 


12 


Section 2.2 
la. 3 b. 3] 65 Key: 3|6 = 36 
4 41976432 
5 519876443311 
6 61/9987665543110 
7 7188763332 
8 8199766533210 
ce. 3/5 6 Key: 3|5 = 35 
4}234679 
5}/1133446789 
6/0113455667899 
7/23336788 
8}01233566799 


. More than 50% of the 50 richest people are older than 60. 


(Answers will vary.) 


A31 
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) 
bo 

= 
uo 


Key: 3|5 = 35 
5 6 
23 
67 
11 
67 
01 
55 
23 
67 
01233 
566799 


c. Most of the 50 richest people are older than 60. (Answers 
will vary.) 


OWN DW FR OW OO fF 
\o 


CmANANINDA NN FH HW 


3a. Use age for the horizontal axis. 


b. Ages of the 50 
Richest People 


° 
eee © 0 © © 6© eee 
oo ee 00 © © ee eeeeee cocccee ee cee cove coe @ 
1 1 1 1 1 1 hi n f fi fi n i 
——!_t_1—_+—_ ++ ++ +—_++_ ++" +_++[_ +_ +++ 


30 40 50 60 70 80 90 
Age 


c. A large percentage of the ages are over 60. (Answers will 
vary.) 


ae Type of Relative 
degree f frequency Angle 
Associate’s 455 0.23 82.8° 
Bachelor’s 1052 0.54 194.4° 
Master’s 325 0.17 61.2° 
First professional 71 0.04 14.4° 
Doctoral 38 0.02 Ta 
f “apa 
> f = 1941 y-=1 > = 360 
n 
b. Earned Degrees 
Conferred in 1990 
Doctoral sitet 
2% 4% 
Associate’s Master’ 
23% 17% 
Bachelor’s 
54% 


c. From 1990 to 2007, as percentages of the total degrees 
conferred, associate’s degrees increased by 1%, bachelor’s 
degrees decreased by 3%, master’s degrees increased by 
3%, first professional degrees decreased by 1%, and 
doctoral degrees remained unchanged. 


5a. Cause Frequency, f 
Auto Dealers 14,668 
Auto Repair 9728 
Home Furnishing 77192 
Computer Sales 5733 
Dry Cleaning 4649 
A32 


Causes of BBB 
Complaints 


Frequency 
oo 
So 
i=) 
o 


6,000 +- 
4,000 =- 
2,000 -- 
ow ow 0 00 in pop 7 
$5 25 22 28 Fe 
a (5,05 aI 
Serer Ss 
go oO 
ZO 


. It appears that the auto industry (dealers and repair 


shops) account for the largest portion of complaints filed 
at the BBB. (Answers will vary.) 


6 ab. Salaries c. It appears that the 
50,000 longer an employee is 
& 45,000-- with the company, the 
S 40,000- -° larger the employee’s 
= 35,000-++ . 1 lb 
= 30.00+—,°* salary will be. 
4 25,000} 
s 
A 20,000-£ 
2 4 6 8 10 
Length of employment 
(in years) 
7 ab. Cellular Phone Bills c. The average bill 
@ ot increased from 1998 to 
eal 2004, then it hovered 
= 46+ around $50.00 from 
a ol 2004 to 2008. 
S 40+ 
& 3st 
Bo WEEE 
< CNDOHAMAHOL DD 
RASSSSSSSSS 
RS RBNANANANAAAAN 
Year 
Section 2.3 
La. 1193 b. 79.5 


c 
2a. 


The mean height of the players is about 79.5 inches. 


18, 18, 19, 19, 19, 20, 21, 21, 21, 21, 23, 24, 24, 26, 27, 27, 29, 
30, 30, 30, 33, 33, 34, 35, 38 


. 24 


. The median age of the sample of fans at the concert is 24. 


25, 60, 80, 97,100, 130, 140, 200, 220,250 b. 115 


. The median price of the sample of digital photo frames is 


$115. 

324, 385, 450, 450, 462, 475, 540, 540, 564, 618, 624, 638, 670, 
670, 670, 705, 720, 723, 750, 750, 825, 830, 912, 975, 980, 980, 
1100, 1260, 1420, 1650 


. 670 


c. The mode of the prices for the sample of South Beach, FL 


5a. 


condominiums is $670. 
‘Yes 


. In this sample, there were more people who thought public 


cell phone conversations were rude than people who did 
not or had no opinion. 
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6a. 
b. 
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21.6; 21; 20 


The mean in Example 6 (¥ ~© 23.8) was heavily influenced 
by the entry 65. Neither the median nor the mode was 
affected as much by the entry 65. 


7 ab. 
Source Score, x Weight, w xw 
Test mean 86 0.50 43.0 
Midterm 96 0.15 14.4 
Final exam 98 0.20 19.6 
Computer lab 98 0.10 9.8 
Homework 100 0.05 5.0 
Yw = 1.00 | S(x-w) = 918 
c. 91.8 
d. The weighted mean for the course is 91.8. So you did get 
an A. 
8 abe. Soa 
Class Midpoint,x | Frequency, f xf 
35-41 38 2 76 
42-48 45 5 225 
49-55 52 7 364 
56-62 59 7 413 
63-69 66 10 660 
70-76 73 5: 365 
771-83 80 8 640 
84-90 87 6 522 
N =50 D(x+ f) = 3265 
d. 65.3 
Section 2.4 
1a. Min = 23, or $23,000; Max = 58, or $58,000 
b. 35, or $35,000 
c. The range of the starting salaries for Corporation B, which 
is 35, or $35,000, is much larger than the range of 
Corporation A. 
2a. 41.5, or $41,500 
” Salary, x Deviation, x — pu 
(1000s of dollars) (1000s of dollars) 
23 —18.5 
29 =125 
32 =9.5 
40 =1.5 
41 —0.5 
41 —0.5 
49 res) 
50 8.5 
52 10.5 
58 16.5 
Sx = 415 d(x — p) =0 


3 ab. w = 41.5, or $41,500 


Salary, x v— php (xv — p)’ 
23 —18.5 342.25 
29 —12.5 156.25 
32 =955. 90.25 
40 =15 2.25 
41 —0.5 0.25 
41 —0.5 0.25 
49 fs) 56.25 
50 8.5 72.25 
52 10.5 110.25 
58 16.5 272.25 
Sx =415 | S(Qx-—pw)=0 | S(x - wy = 11025 
ce. 110.3. d. 10.5, or $10,500 
e. The population standard deviation is 10.5, or $10,500. 
4a. See 3ab. b. 122.5 ce. 11.1, or $11,100 
d. The population standard deviation is 11.1, or $11,100. 
5a. Enter data.  b. 37.89; 3.98 
6a. 7,7,7,7,7, 13,13, 13,13,13  b. 3 
7a. 1 standard deviation b. 34% 
c. Approximately 34% of women ages 20-29 are between 


64.3 and 66.92 inches tall. 


- 0 b. 70.6 
. At least 75% of the data lie within 2 standard deviations 


of the mean. At least 75% of the population of Alaska is 
between 0 and 70.6 years old. 


= f Ey b. 1.7 
0 10 0 
1 19 19 
2 7 14 
3 E 21 
4 5 20 
5 1 5 
6 it 6 
n= 50 Xxf = 85 
rs 15 
x-x | w«—x) @=x)f : 
17 2.89 28.90 
—0.7 0.49 9.31 
0.3 0.09 0.63 
13 1.69 11.83 
23 5.29 26.45 
33 10.89 10.89 
43 18.49 18.49 
d(x — Xf = 106.5 


A33 
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10a. 


Class x if xf 
0-99 49.5 380 18,810 
100-199 | 149.5 230 34,385 
200-299 | 249.5 210 52,395 
300-399 | 349.5 50 17,475 
400-499 | 449.5 60 26,970 
500+ 650.0 70 45,500 
n= 1000 | Sxf = 195,535 
b. 195.5 
x—-X¥ (x — x)? (x — xf 
—146.0 21,316 8,100,080 
—46.0 2116 486,680 
54.0 2916 612,360 
154.0 23,716 1,185,800 
254.0 64,516 3,870,960 
454.5 | 206,570.25 14,459,917.5 
S(x — x)°f = 28,715,797.5 
d. 169.5 
Section 2.5 


La. 35, 36, 42, 43, 44, 46, 47, 49, 51,51, 53,53, 54, 54, 56, 57, 58, 
59, 60, 61, 61, 63, 64, 65, 65, 66, 66, 67, 68, 69, 69, 72, 73, 73, 
73, 76, 77, 78, 78, 80, 81, 82, 83, 83, 85, 86, 86, 87, 89, 89 
b. 65.5 c. 54,78 
d. About one fourth of the 50 richest people are 54 years 
old or younger; one half are 65.5 years old or younger; 
and about three fourths of the 50 richest people are 
78 years old or younger. 
2a. Enter data. b. 17, 23, 28.5 
c. One quarter of the tuition costs is $17,000 or less, one half 
is $23,000 or less, and three quarters is $28,500 or less. 
3a. 54,78 b. 24 
c. The ages of the 50 richest people in the middle portion of 
the data set vary by at most 24 years. 
4a. Min = 35,Q; = 54, Q) = 65.5, Q3 = 78, Max = 89 
be. Ages of the 50 Richest People d. It appears that half 
of the ages are 


° rit ; between 54 and 78. 
35 54 65.5 78 89 
| 
30 40 50 60 70 80 90 
Age 


5a. 50th percentile 
b. 50% of the 50 richest people are younger than 66. 
6a. w= 70,0 =8 


60 — 70 
ear eee 
ey a 

8 
oes 


b. From the z-scores, $60 is 1.25 standard deviations below 
the mean, $71 is 0.125 standard deviation above the mean, 
and $92 is 2.75 standard deviations above the mean. 


7a. Best Actor: wp = 43.7, 0 = 8.7 
Best Actress: w = 35.9,0 = 11.4 
b. Sean Penn: z = 0.49 
Kate Winslet: z = —0.25 


c. The age of Sean Penn is 0.49 standard deviation above 
the mean and the age of Kate Winslet is 0.25 standard 
deviation below the mean. Both z-scores fall between —2 
and 2, so neither would be considered unusual. Comparing 
the two measures indicates that Sean Penn is further 
above the average age of actors than Kate Winslet is 
below the average age of actresses. (Answers will vary.) 


CHAPTER 3 


Section 3.1 
lab. (1) 


Yes No Not sure 


Yes No Not sure 


a ee 
NE S MW W NE S MW W NE S MW W 


ce. (1) 6 (2) 12 
d. (1) Let Y = Yes, N = No, NS = Not sure, M = Male, 
F = Female. 


Sample space = 

{ YM, YF, NM, NF, NSM, NSF} 
(2) Let Y = Yes, N = No, NS = Not sure, 

NE = Northeast, S = South, MW = Midwest, 

W = West. 

Sample space = 

{YNE, YS, YMW,YW, NNE, NS, NMW, NW, 
NSNE, NSS, NSMW, NSW} 
2a. (1) 6 (2) 1 
b. (1) Not a simple event because it is an event that consists 
of more than a single outcome. 


(2) Simple event because it is an event that consists of a 
single outcome. 
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3a. Manufacturer: 4,Size: 2,Color: 5 b. 40 


c. Ww 
LR 
C-++B 
LG 
LT 
—F 
Ww Ww 
LR LR 
M+B C-+B 
LG LG 
LT LT 
G 
-W -W 
LR LR 
“| c+sp ‘M-+B 
LG LG 
a LT 
LH 
Ww Ww 
LR LR 
M+B CB 
LG LG 
LT LT 
T 
Ww 
LR 
LM-—UB 
LG 
LT 


4a. (1) Each letter is an event (26 choices for each). 


(2) Each letter is an event (26, 25, 24, 23, 22, and 21 
choices). 


(3) Each letter is an event (22, 26, 26, 26, 26, and 26 
choices). 


b. (1) 308,915,776 (2) 165,765,600 (3) 261,390,272 
5a. (1) 52. (2) 52. (3) 52 

b. (1) 1 (2) 13s (3) 52 

ec. (1) 0.019 (2) 0.25 (3) 1 
6a. The event is “the next claim processed is fraudulent.” 

The frequency is 4. 

b. 100. 0.04 

7a. 54 pb. 1000 ec. 0.054 


8 a. The event is “salmon successfully passing through a dam 
on the Columbia River.” 


b. Estimated —¢. Empirical probability 
9a. 0.18 b. 0.82  c. 3 or 0.82 
10a. 5 b. 0.313 
1 


lla. 10,000,000 b. 10,000,000 


Section 3.2 


La. (1) 30and102 (2) 11 and 50 
b. (1) 0.294 (2) 0.22 
2a. (1) Yes (2) No 


b. (1) Dependent (2) Independent 


3a. (1) Independent (2) Dependent 


b. (1) 0.723 (2) 0.059 
4a. (1) Event (2) Event (3) Complement 
b. (1) 0.729 (2) 0.001 (3) 0.999 


c. (1) The event cannot be considered unusual because its 
probability is not less than or equal to 0.05. 


(2) The event can be considered unusual because its 
probability is less than or equal to 0.05. 


(3) The event cannot be considered unusual because its 
probability is not less than or equal to 0.05. 


5a. (1) and (2) A = {is female}, 
B = {works in health field} 
b. (1) P(A and B) = P(A)+ P(B|A) = (0.65) « (0.25) 
(2) P(A and B') = P(A)-(1 — P(BIA)) 
= (0.65) + (0.75) 


c. (1) 0.163 (2) 0.488 


Section 3.3 


la. (1) None are true. (2) None are true. 
(3) All are true. 
b. (1) Not mutually exclusive (2) Not mutually exclusive 
(3) Mutually exclusive 
2a. (1) Mutually exclusive (2) Not mutually exclusive 
b. (1) 2,5 (2) %.8,4 © (1) 0.667 (2) 0.423 
3a. A = {sales between $0 and $24,999} 
B = {sales between $25,000 and $49,999} 
b. A and B cannot occur at the same time. 
A and B are mutually exclusive. 


e 3, d. 0.222 


da. (1) A = {type B} 
B = {type AB} 
(2) A = {type O} 


B = {Rh-positive } 
b. (1) A and B cannot occur at the same time. 
A and B are mutually exclusive. 
(2) A and B can occur at the same time. 


A and B are not mutually exclusive. 


45 16 184 344 
¢. (1) 409° 409 (2) 409° 409 


d. (1) 0.149 (2) 0.910 
5a. 0.141  b. 0.859 
Section 3.4 


la. 8  b. 40,320 
2a. 336 


b. There are 336 possible ways that the subject can pick a 
first, second, and third activity. 


3a.n=12,r=4 _ b. 11,880 


A35 
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4a, n = 20,n, = 6,n) =9,n3=5 bd. 77,597,520 Sab. 


x P(x) xP(x) 
5a.n=20,r=3 _ b. 1140 7 ee aT 
c. There are 1140 different possible three-person 1 0.19 0.19 
committees that can be selected from 20 employees. ) 0.15 0.30 
7a. 1 outcome and 180 distinguishable permutations 4 0.09 0.36 
b. 0.006 5 0.10 0.50 
8a. 3003 b. 3,162,510 «. 0.0009 08 ten 
92.10 b. 220 ©. 0.045 i it “ 
YP(x) =1 SxP(x) = 2.60 
CHAPTER 4 cp = 2.6 
On average, a new employee makes 2.6 sales per day. 
Section 4.1 6ab. 
1a. (1) Measured (2) Counted x P(x) pe PA Aa P(x)(x — »)? 
b. (1) The random variable is continuous because x can be 0 0.16 ~2.6 6.76 1.0816 
any speed up to the maximum speed of a Space Shuttle. 1 0.19 1.6 2.56 0.4864 
(2) The random variable is discrete because the number 2 0.15 —0.6 0.36 0.0540 
of calves born on a farm in one year is countable. 3 0.21 0.4 0.16 0.0336 
2ab. 4 0.09 1.4 1.96 0.1764 
ii f ne 5 0.10 2.4 5.76 0.5760 
0 16 0.16 6 0.08 3.4 11.56 0.9248 
I 19 0.19 7 0.02 44 19.36 0.3872 
2 15 0.15 
P(x) =1 — py =3. 
a ai a4 P(x) UP(x)(x — wn)” = 3.72 
4 9 0.09 ce. 1.9 
5 10 os d. Most of the data values differ from the mean by no more 
6 0.08 than 1.9 sales per day. 
# 2 0.02 "Tab. 
n=100 | SP(x)=1 
Gain, x $1995 | $995 | $495 | $245 | $95 | —$5 
c. New Employee Sales 7 7 
oo Probability, P(x) 7000 7000 7000 7000 7000 2000 
3 020+ c. —$3.08 
ges d. Because the expected value is negative, you can expect to 
Py 0.10-+ lose an average of $3.08 for each ticket you buy. 
z 0.05 + 
a Section 4.2 


a aa a a a a 
012345 6 7 
Number of sales per day 


la. Trial: answering a question 
Success: question answered correctly 
b. Yes 


c. It is a binomial experiment; 
n = 10, p = 0.25, q = 0.75, x = 0,1, 2,3, 4, 5, 6, 7, 8, 9, 10 


2a. Trial: drawing a card with replacement 


3a. Each P(x) is between0and1. b. }P(x) = 1 


c. Because both conditions are met, the distribution is a 
probability distribution. 


4a. (1) Yes, each outcome is between 0 and 1. 
(2) Yes, each outcome is between 0 and 1. 
b. (1) Yes (2) Yes 
c. (1) A probability distribution 
(2) A probability distribution 


Success: card drawn is a club 
Failure: card drawn is not a club 
b. n = 5, p = 0.25,q = 0.75, x = 3 
5! 


Cc. P(3) = 2131 | 


0.25) (0.75)* © 0.088 
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3a. Trial: selecting an adult and asking a question 
Success: selecting an adult who likes texting because it 
works where talking won’t do 
Failure: selecting an adult who does not like texting 
because it works where talking won’t do 
b. n = 7, p = 0.75, q = 0.25, x = 0,1,2,3,4,5,6,7 
c. P(0) = 7Cp (0.75)° (0.25)’ ~ 0.00006 
P(1) = 7C; (0.75)! (0.25)® = 0.00128 
P(2) = 7C (0.75)? (0.25)° © 0.01154 
P(3) = 7C; (0.75)3 (0.25)* © 0.05768 
P(4) = 7C4 (0.75)* (0.25)? © 0.17303 
P(5) = 7Cs (0.75)° (0.25)? © 0.31146 
P(6) = 7Cg (0.75)° (0.25)! © 0.31146 
P(7) = 7C;7 (0.75) (0.25)° ~ 0.13348 
a. P(x) 
0 0.00006 
1 0.00128 
2 0.01154 
3 0.05768 
4 0.17303 
5 0.31146 
6 0.31146 
7 0.13348 
YP(x) 1 
4a. n = 250, p = 0.71,x = 178 _ b. 0.056 
c. The probability that exactly 178 people from a random 


6a. 


. n= 10, p = 0.55, x = 4 
. The probability that exactly 4 of the 10 small businesses 


sample of 250 people in the United States will use more 
than one topping on their hotdogs is about 0.056. 


. Because 0.056 is not less than or equal to 0.05, this event 


is not unusual. 


~(1)x=2 (2) x =2,3,4,0r5 (3) x =0orl 
. (1) 0.217 


(2) 0.217, 0.058, 0.008, 0.0004; 0.283 
(3) 0.308, 0.409; 0.717 


. (1) The probability that exactly two of the five men 


consider fishing their favorite leisure-time activity 
is about 0.217. 


(2) The probability that at least two of the five men 
consider fishing their favorite leisure-time activity 
is about 0.283. 


(3) The probability that fewer than two of the five men 
consider fishing their favorite leisure-time activity 
is about 0.717. 


Trial: selecting a business and asking if it has a website 
Success: selecting a business with a website 

Failure: selecting a business without a website 

c. 0.160 


have websites is 0.160. 


. Because 0.160 is greater than 0.05, this event is not 


unusual. 


7a. 0.001, 0.022, 0.142, 0.404, 0.430 


b. Cc Owning a Computer 
x || JH(9) Bes 
0 | 0.001 . o4s + 
1 | 0.022 8 0354 
= 030+ 
2 | 0.142 B pos 
3 | 0.404 z eral 
4 | 0.430 3 boost 
Sr EE a 
01234 
Number of households 
Skewed left 


d. Yes, it would be unusual if exactly zero or exactly one of 
the four households owned a computer, because each of 
these events has a probability that is less than 0.05. 


8a. Success: selecting a clear day 
n = 31, p = 0.44, q = 0.56 
b. 13.6 76 d. 2.8 


e. On average, there are about 14 clear days during the 
month of May. 


f. A May with fewer than 8 clear days or more than 19 clear 
days would be unusual. 


Section 4.3 


La. 0.740.192 _ b. 0.932 


c. The probability that LeBron makes his first free throw 
shot before his third attempt is 0.932. 

2a. P(0) © 0.050 
P(1) = 0.149 
P(2) © 0.224 
P(3) © 0.224 
P(4) = 0.168 
b. 0.815 «0.185 


d. The probability that more than four accidents will occur in 
any given month at the intersection is 0.185. 


3a. 0.10 b. 0.10,3  e. 0.0002 


d. The probability of finding three brown trout in any given 
cubic meter of the lake is 0.0002. 


e. Because 0.0002 is less than 0.05, this can be considered an 
unusual event. 


CHAPTER 5 


Section 5.1 
la. A: x = 45, B: x = 60, C: x = 45;B has the greatest mean. 


b. Curve C is more spread out, so curve C has the greatest 
standard deviation. 


2a. x = 660 b. 630, 690; 30 
3. (1) 0.0143 (2) 0.9850 
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4a. . b. 0.9834 


fh b. 0.0154 &. 0.9846 


T 
—2.16 


0 2.13 
t 
0 


6a. 0.0885 b. 0.0152 — e. 0.0733 


d. 7.33% of the area under the curve falls between z = —2.165 


and z = —1.35. 


Section 5.2 
la. 


2a. 


b. 0.86 

c. 0.1949 

d. The probability that a 
randomly selected vehicle 
is violating the 70 mile per 
hour speed limit is 0.1949. 


b. —1, 1.25 

c. 0.1587; 0.8944; 0.7357 

d. If 150 shoppers enter 
the store, then you 
would expect 
150(0.7357) = 110.355, 


"9 21 / 45 57\69 81 * or about 110, shoppers to 


ed ad be in the store between 
Time (in minutes) 


33 and 60 minutes. 


3a. Read user’s guide for the technology tool. 


b. 0.5105 


c. The probability that a randomly selected U.S. person’s 
triglyceride level is between 100 and 150 is 0.5105. 


Section 5.3 


La. (1) 0.0384 (2) 0.0250 and 0.9750 


be. (1) -1.77. (2) 
2a. (1) Area = 0.10 
(3) Area = 0.99 


+ 1.96 
(2) Area = 0.20 


be. (1) -1.28 (2) —0.84 (3) 2.33 


3a. w= 52, 0= 15 


b. 17.05; 98.5; 60.7 


c. 17.05 pounds is below the mean, 60.7 pounds and 98.5 
pounds are above the mean. 
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4ab. c. 116.93 
d. So, the longest braking 
distance a Nissan Altima 
could have and still be in 
1% the bottom 1% is about 
117 feet. 
2.33 0 : 
Sab. ce. 8.512 
d. So, the maximum length of 
time an employee could 
10% have worked and still be 
y laid off is about 8.5 years. 
1.28 0 : 
Section 5.4 
la. 
Sample | Mean | Sample | Mean | Sample | Mean 
1545.1 i 3;3;5 3.67 35,751. 4.33 
1,13 1.67 35357 4.33 33.153 5 
1,155 233 Cpenal ) 5,459 5.67 
1.47 3 35953 3.67 5,741 6.33 
1,3,1 1.67 3,555 4.33 7,1,1 3 
1,3,3 2.33 3355.7 =) Ts 1,3 3.67 
1,3,5 3 3,4, 1 3.67 7,4,5 4.33 
13357 3.67 33153 4.33 Pls 7 >) 
1,554 2.33 3,7,5 5 735 3.67 
15,3 3 3,737 5.67 Yer. 4.33 
1,5,5 3.67 5,1,1 2.33) 45392 ) 
159,.7 4.33 5,13 3 Too 5.67 
141 3 35.1,5 3.67 7551 4.33 
T2533) 3.67 951.7 4.33 Tea3 5 
1,7,5 4.33 5,31 3 T355 5.67 
1,7,7 e) 5, 3,3 3.67 de 6.33 
3,1,1 1.67 5,3;5 4.33 Til 5 
3; 133 2:33 5357 2) dyhgo 5.67 
3,15 3 5,250 3.67 F159 6.33 
33157 3.67 33.553 4.33 Tels 7 
3,31 2:33 5595/2) 5 
3,359 3 3,557 5.67 
D: x f | Probability oi 
(ox)° © 1.667 
1 1 0.0156 
1.67] 3 | 0.0469 ox © 1.291 
2.33 6 0.0938 
3 10 0.1563 
3.67 | 12 0.1875 
4.33 | 12 0.1875 
5 10 0.1563 
5.67 6 0.0938 
6.33 3 0.0469 
a 1 0.0156 
GQ wy =pe=4 
2 
an Oe ee ee 
(oz) ae Ga 1.667; ox a Aa 1.291 
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2a. 63, 1.4 
b. n = 64 


n= '< 


of\ 
x 


58.8 60.2 61.6 63.0 64.4 65.8 67.2 
Mean of phone bills (in dollars) 


c. With a smaller sample size, the mean stays the same but 
the standard deviation increases. 
3a. 3.5, 0.05 


3.35 3.40 3.45 3.50 3.55 3.60 3.65 
Mean diameter (in feet) 


4a. 25;015  b. —2,3.33  e 0.0228; 0.996; 0.9768 


LN 


24.70 25 25.30 
Mean time (in minutes) 


d. Of the samples of 100 drivers ages 15 to 19, 97.68% will 


have a mean driving time between 24.7 and 25.5 minutes. 


5a. 290,600; 10,392.30 


JN 


269,816 290,600 311,385 


Mean sales price (in dollars) 
b. -2.46 ce. 0.0069; 0.9931 
d. 99.31% of samples of 12 single-family houses will have a 
mean sales price greater than $265,000. 
6a. 0.21;0.66 — b. 0.5832; 0.7454 
c. There is about a 58% chance that an LCD computer 
monitor will cost less than $200. There is about a 75% 


chance that the mean of a sample of 10 LCD computer 
monitors is less than $200. 


Section 5.5 


la. n = 125, p = 0.05,g = 0.95 _ b. 6.25, 118.75 
c. Normal distribution can be used. d. 6.25, 2.44 


2a. (1) 57,58,...,83 (2) ...,52,53,54 
b. (1) 565 <x < 83.5 (2) x < 545 
3a. Normal distribution can be used. __b. 6.25, 2.44 


c d. 1.33 e. 0.9082; 0.0918 


\ 1 

t + 

0 2 4 6 8 10 12 
Number responding yes 


4a. Normal distribution can be used. __b. 116, 6.98 
d. —2.22 e. 0.0132 


95 1051S 125d 
Number responding never 
5a. Normal distribution can be used. __b. 36, 5.23 


C. d. —1.82,—1.625 — e. 0.0177 


20 25 30 35 40 45 50 
Number responding yes 


CHAPTER 6 


Section 6.1 


la. x = 138.5 
b. A point estimate for the population mean number of friends 
is 138.5. 
2a. z. = 1.96,n = 30,5 + 51.0 b. E & 183 
ce. You are 95% confident that the maximum error of the 
estimate is about 18.3 friends. 
3a. X = 138.5,E ~ 18.3  b. 120.2, 156.8 
c. With 95% confidence, you can say that the population 
mean number of friends is between 120.2 and 156.8. This 
confidence interval is wider than the one found in 
Example 3. 
4a. Enter the data. 
b. (121.2, 140.4); (118.7, 142.9); (109.2, 152.4) 
c. As the confidence level increases, so does the width of the 
interval. 
5a. n = 30,X = 22.9,0 = 1.5, z. = 1.645, E = 0.5 
b. (22.4, 23.4) [Tech: (22.5, 23.4)] 
c. With 90% confidence, you can say that the mean age of 
the students is between 22.4 (Tech: 22.5) and 23.4 years. 


Because of the larger sample size, the confidence interval is 
slightly narrower. 
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6a. z, = 1.96, E = 10,5 ~ 53.0 b. n = 108 


c. You should have at least 108 users in your sample. Because 


of the larger margin of error, the sample size needed is 
much smaller. 


Section 6.2 


ladf=21 bc=0.90 of, = 1.721 
2a. t, = 1.753, E © 4.4;t, = 2.947, E = 74 
b. (157.6, 166.4); (154.6, 169.4) 


c. With 90% confidence, you can say that the population 
mean temperature of coffee sold is between 157.6°F and 
166.4°F. 


With 99% confidence, you can say that the population 
mean temperature of coffee sold is between 154.6°F and 
169.4°F. 


3a. t, = 1.729, E © 0.921, = 2.093, E = 1.12 
b. (8.83, 10.67); (8.63, 10.87) 


c. With 90% confidence, you can say that the population 
mean number of days the car model sits on the lot is 
between 8.83 and 10.67; with 95% confidence, you can 
say that the population mean number of days the car 
model sits on the lot is between 8.63 and 10.87. The 
90% confidence interval is slightly narrower. 


4. Use a t-distribution because the sample size is small 
(n < 30), the population is normally distributed, and 
the population standard deviation is unknown. 


Section 6.3 


la. x = 181,n = 1006 b. p © 0.180 
2a. p ~ 0.180, g © 0.820 
b. np © 181 > Sand ng © 825 >5 
c. z. = 1.645, E = 0.020 d. (0.160, 0.200) 


e. With 90% confidence, you can say that the proportion 
of adults who think Abraham Lincoln was the greatest 
president is between 16.0% and 20.0%. 


3a. p = 0.25,g = 0.75 
b. np = 124.5 > 5 and ng = 373.5 > 5 
c. z, = 2.575, E © 0.050 d. (0.20, 0.30) 


e. With 99% confidence, you can say that the proportion of 
US. adults who think that people over 65 are the more 
dangerous drivers is between 20% and 30%. 


4a. (1) p = 05,4 = 0.5, z, = 1.645, E = 0.02 
(2) p = 0.11, G = 0.89, z, = 1.645, E = 0.02 
b. (1) 1691.27 (2) 662.30 
c. (1) 1692 females (2) 663 females 


Section 6.4 
la. df. = 29,c = 0.90 b. 0.05,0.95 «42.557, 17.708 


d. 90% of the area under the curve lies between 17.708 and 
42.557. 
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2a. 42.557, 17.708; 45.722, 16.047 
b. (0.98, 2.36); (0.91, 2.60) e. (0.99, 1.54); (0.96, 1.61) 


d. With 90% confidence, you can say that the population 
variance is between 0.98 and 2.36 and the population 
standard deviation is between 0.99 and 1.54. With 95% 
confidence, you can say that the population variance is 
between 0.91 and 2.60 and the population standard 
deviation is between 0.96 and 1.61. 


CHAPTER 7 


Section 7.1 


la. (1) The mean is not 74 months. 
pb #74 
(2) The variance is less than or equal to 2.7. 
ga 27 
(3) The proportion is more than 24%. 
p > 0.24 
b. (1) w= 74 (2) 0? > 2.7 (3) p = 0.24 
c. (1) Ap: w = 74; Hy: w # 74 (claim) 
(2) Ho: 0? = 2.7 (claim); H,: 0” > 2.7 
(3) Ao: p = 0.24; H,: p > 0.24 (claim) 
2a. Hp: p = 0.01; H,: p > 0.01 

b. A type I error will occur if the actual proportion is less 
than or equal to 0.01, but you reject Hp. 

A type II error will occur if the actual proportion is 
greater than 0.01, but you fail to reject Hp. 

c. A type II error is more serious because you would be 
misleading the consumer, possibly causing serious injury 
or death. 

3a. (1) Hp: The mean life of a certain type of automobile 
battery is 74 months. 
H: 


_ The mean life of a certain type of automobile 


battery is not 74 months. 
Ao: w = 74; Ay: pw # 74 
(2) Ho: The variance of the life of the home theater 
systems is less than or equal to 2.7. 
: The variance of the life of the home theater 
systems is greater than 2.7. 
Hy. 0? = 2.7; Ho? > 2.7 
(3) Hp: The proportion of homeowners who feel their 
house is too small for their family is less than or 
equal to 24%. 


H,: The proportion of homeowners who feel their 
house is too small for their family is greater 
than 24%. 


Ho: p < 0.24; Hy p > 0.24 
b. (1) Two-tailed (2) Right-tailed (3) Right-tailed 
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4a. 


° (1) 5 P-value 


. (1) Support claim. 
. (1) Ao: w = 650; H,: w < 650 (claim) 
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area 


5 P-value (2) P-value 
area area 


There is enough evidence to support the realtor’s claim 
that the proportion of homeowners who feel their house is 
too small for their family is more than 24%. 


. There is not enough evidence to support the realtor’s claim 


that the proportion of homeowners who feel their house is 
too small for their family is more than 24%. 


(2) Reject claim. 


(2) Ho: w = 98.6 (claim); H,: w # 98.6 


Section 7.2 
la. (1) 0.0347 > 0.01 (2) 0.0347 < 0.05 
b. (1) Fail to reject Hp. (2) Reject Hp. 
Zab. 0.0436 
c. Reject Hy because 0.0436 < 0.05. 
3a. 0.9495 _b. 0.1010 


on 
4a. 


Fail to reject Hy because 0.1010 > 0.01. 


The claim is “the mean speed is greater than 35 miles 
per hour.” 


Ah: w = 35; H,: w > 35 (claim) 


b.a=0.005 «2.5 d. 0.0062  e. Reject Hp. 
f. There is enough evidence at the 5% level of significance 


5a. 


. a = 0.01 


to support the claim that the average speed is greater than 
35 miles per hour. 


The claim is “one of your distributors reports an average 
of 150 sales per day.” 


A: w = 150 (claim); H,: w» # 150 
c. —2.76 d. 0.0058 


e. Reject Hy because 0.0058 < 0.01. 


f. There is enough evidence at the 1% level of significance 


6a. 
7a. 


to reject the claim that the distributorship averages 150 
sales per day. 


0.0440 > 0.01 bz Fail to reject Hp. 
b. 0.1003 
a=0.10 CG Zo = —1.28 


z d. Rejection region: z < —1.28 


b. 0.0401, 0.9599 
ce —Z% = —1.75, z = 1.75 


d. Rejection regions: z < —1.75, 
z> 1.75 


. The claim is “the mean work day of the company’s 


mechanical engineers is less than 8.5 hours.” 
Ah: w = 8.5; Hy: w < 8.5 (claim) 


b. a = 0.01 
c. Z = —2.33; Rejection region: z << —2.33 
d. —3.55 


-3 0 -1 0 1 2 3 
zZ=-3.55 


e. Because —3.55 < —2.33, reject Mp. 


f. There is enough evidence at the 1% level of significance 


to support the claim that the mean work day is less than 
8.5 hours. 


a = 0.01 


s =Z9 = —2.575, Z% = 2.575; 


Rejection regions: z < —2.575, z > 2.575 


“ =1. 0 1 2°*0 


Fail to reject Hp. 


. There is not enough evidence at the 1% level of 


significance to reject the claim that the mean cost of 
raising a child from birth to age 2 by husband-wife 
families in the United States is $13,120. 


Section 7.3 


la. 
2a. 
3a. 
4a. 


13. b. —2.650 
8 b. 1.397 
15 b. —2.131, 2.131 


The claim is “the mean cost of insuring a 2008 Honda 
CR-V is less than $1200.” 


Ap: w = $1200; H,: w < $1200 (claim) 


. a = 0.10, d.f = 6 


c. ty = —1.440; Rejection region: f < —1.440 
d. —3.61 


a=0.10 


4 
/3-2% oO 1 2 3 
t=-3.61 


. Reject Hp. 


. There is enough evidence at the 10% level of 


significance to support the insurance agent’s claim that 
the mean cost of insuring a 2008 Honda CR-V is less 
than $1200. 
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5a. The claim is “the mean conductivity of the river is 1890 
milligrams per liter.” 


Ay: w = 1890 (claim); H,: ~ # 1890 
b. a = 0.01,d.£ = 18 
C. fy = —2.878, ty = 2.878 

Rejection regions: f < —2.878, t > 2.878 
d. 3.798 


e. Reject Hp. 


f. There is enough evidence at the 1% level of significance 
to reject the company’s claim that the mean conductivity 
of the river is 1890 milligrams per liter. 


6a. The claim is “the mean wait time is at most 18 minutes.” 
Hp: w = 18 minutes (claim); H,: w > 18 minutes 
b. 0.9997 
c. 0.9997 > 0.05; Fail to reject Hp. 


d. There is not enough evidence at the 5% level of 
significance to reject the office’s claim that the mean 
wait time is at most 18 minutes. 


Section 7.4 
la. np = 31.25 > 5, ng = 93.75 > 5 


b. The claim is “more than 25% of U.S. adults have used a 
cellular phone to access the Internet.” 


Hy: p = 0.25; H,: p > 0.25 (claim) 


ec. a = 0.05 
d. zy = 1.645; Rejection region: z > 1.645 
e. 1.81 


3-2-1 0 14,3 
3 81 


f. Reject Hp. 


g. There is enough evidence at the 5% level of significance 
to support the research center’s claim that more than 
25% of U.S. adults have used a cellular phone to access 
the Internet. 

2a. np = 75 >5,nq=175 >5 

b. The claim is “30% of U.S. adults have not purchased a 
certain brand because they found the advertisements 
distasteful.” 

Ah: p = 0.30 (claim); H,: p # 0.30 

c. a = 0.10 


A42 


, —Zo = —1.645, zo = 1.645; 


Rejection regions: z < —1.645, z > 1.645 


e. 2.07 
3 *o 0 1 %\3 
2=2.07 
f. Reject Ho. 
g. There is enough evidence at the 10% level of significance 


to reject the claim that 30% of U.S. adults have not 
purchased a certain brand because they found the 
advertisements distasteful. 


Section 7.5 

la. df. = 17,a = 0.01  b. 33.409 

2a. df. = 29,a = 0.05  b. 17.708 

3a. df. = 50,a = 0.01 b. 79.490 ¢ 27.991 


4a. 


mao & 


5a. 


moo & 


The claim is “the variance of the amount of sports drink in 
a 12-ounce bottle is no more than 0.40.” 


Ho: 07 = 0.40 (claim); H,: 07 > 0.40 


. a = 0.01, d.f. = 30 
. XG = 50.892; Rejection region: y? > 50.892 
. 56.250 


. There is enough evidence at the 1% level of significance to 


e. Reject Hp. 


reject the bottling company’s claim that the variance of the 
amount of sports drink in a 12-ounce bottle is no more 
than 0.40. 


The claim is “the standard deviation of the lengths of 
response times is less than 3.7 minutes.” 


Hy: 0 = 3.7; Hy: o < 3.7 (claim) 


. a = 0.05, d.f. = 8 
. Xo = 2.733; Rejection region: y* < 2.733 
. 5.259  e. Fail to reject Hp. 


. There is not enough evidence at the 5% level of 


significance to support the police chief’s claim that the 
standard deviation of the lengths of response times is less 
than 3.7 minutes. 


. The claim is “the variance of the weight losses is 25.5.” 


Hy: 07 = 25.5 (claim); H,: 07 # 25.5 


. a = 0.10, df. = 12 


c. x7 = 5.226, x2 = 21.026; 


Rejection regions: y” < 5.226, y* > 21.026 


. 5.082 e. Reject Hp. 


. There is enough evidence at the 10% level of significance 


to reject the company’s claim that the variance of the 
weight losses of the users is 25.5. 
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CHAPTER 8 


Section 8.1 
La. (1) Independent (2) Dependent 
b. (1) Because each sample represents blood pressures of 


2a. 


different individuals, and it is not possible to form a 
pairing between the members of the samples. 

(2) Because the samples represent exam scores of the same 
students, the samples can be paired with respect to 
each student. 

The claim is “there is a difference in the mean annual 

wages for forensic science technicians working for local 

and state governments.” 


Alo: by = 2s Aa: by # Bz (claim) 


. a = 0.10 


ce —Z = —1.645, z = 1.645; 


Rejection regions: z < —1.645, z > 1.645 


. 1.667 


. Reject Hp. 


. There is enough evidence at the 10% level of significance 


to support the claim that there is a difference in the mean 
annual wages for forensic science technicians working for 
local and state governments. 


. z © 1.36; P = 0.0865 


b. Fail to reject Hp. 


c. There is not enough evidence at the 5% level of 


significance to support the travel agency’s claim that the 
average daily cost of meals and lodging for vacationing 
in Alaska is greater than the same average cost for 
vacationing in Colorado. 


Section 8.2 


la. 


The claim is “there is a difference in the mean annual 
earnings based on level of education.” 


Aly: fy = fs Ag: by # bz (claim) 


b. a = 0.01; df£ = 11 

ce. —to9 = —3.106, ft = 3.106; Rejection regions: t < —3.106, 
t > 3.106 

d. —4.63 

e. Reject Ho. 

f. There is enough evidence at the 1% level of significance to 


support the claim that there is a difference in the mean 
annual earnings based on level of education. 


2a. 


The claim is “the watt usage of a manufacturer’s 17-inch 
flat panel monitors is less than that of its leading 
competitor.” 


Aly: by = bys Hy: by < Mz (claim) 


. a = 0.10; df. = 25 


c. tf) = —1.316; Rejection region: t < —1.316 
d. —3.997 


. 


a=0.10 


. Reject Hp. 


There is enough evidence at the 10% level of significance 
to support the manufacturer’s claim that the watt usage of 
its monitors is less than that of its leading competitor. 


Section 8.3 


la. 


The claim is “athletes can decrease their times in the 
40-yard dash.” 


A: ba = 0; He: ba > 0 (claim) 


. a = 0.05; df. = 11 


f = 1.796; Rejection region: t > 1.796 


. d © 0.0233; sq ¥ 0.0607 


e. 1.333 


acd 


= 


3-2-1 0 i % 3 


t= 1.333 
Fail to reject Hp. 


There is not enough evidence at the 5% level of significance 
to support the claim that athletes can decrease their times 
in the 40-yard dash. 


The claim is “the drug changes the body’s temperature.” 
A: ta = 9; Hy: wa # 0 (claim) 

a = 0.05; d.f. = 6 

—ty = —2.447, ty = 2.447; 

Rejection regions: ft < —2.447, t > 2.447 

d © 0.5571; sq * 0.9235 

1.596 


$a = 0.025 a $0, = 0.025 
= t 


—ty -1 0 1/ 23 
t= 1.596 


. Fail to reject Ho. 


There is not enough evidence at the 5% level of significance 
to support the claim that the drug changes the body’s 
temperature. 
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Section 8.4 CHAPTER 9 
la. The claim is “there is a difference between the proportion Section 9.1 
of male high school students who smoke cigarettes and 
the proportion of female high school students who smoke lab if 
cigarettes.” g Blot | 
23 ut 
Ap: p1 = pos Hy: p1 * po (claim) ee 12- 
5 10+ 
b. a = 0.05 a4 31° 
mB 67 i 
Cc. —Zo = —1.96, £0: = 1.96; zg zal ee 
Rejection regions: z < —1.96, z > 1.96 cae x 


5 10 15 20 25 30 
Years out of school 


d. D © 0.1975; g © 0.8025 


e. np © 1382.5 > 5, nq © 5617.5 > 5, nop © 1479.1 > 5, 
and nq ~ 6009.9 > 5. 


f. 4.23 


c. Yes, it appears that there is a negative linear correlation. 
As the number of years out of school increases, the 
annual contribution decreases. 


2 ab. y 


-4 —% 0 % 4 


g. Reject Hp. 

h. There is enough evidence at the 5% level of significance 
to support the claim that there is a difference between O00. 0 4 8 
the proportion of male high school students who smoke Pee On enes) 
cigarettes and the proportion of female high school 
students who smoke cigarettes. 


Pulse rate 
(in beats per minute) 


c. No, it appears that there is no linear correlation between 


height and pulse rate. 
2a. The claim is “the proportion of male high school students 3ab. 50.000 
who smoke cigars is greater than the proportion of female 


high school students who smoke cigars.” 

Ah: py = pr; Hy: py > P2 (claim) 
b. a = 0.05 eZ = 1.645; Rejection region: z > 1.645 
d. p ~ 0.1174; g © 0.8826 


e. mp © 821.8 > 5, mq © 6178.2 > 5, np & 879.2 > 5, 
and nq = 6609.8 > 5 


f. 17.565 


c. Yes, it appears that there is a positive linear correlation. 
As the team salary increases, the average attendance per 
home game increases. 

4a. 7; Sx = 88, Sy = 56.7, Sxy = 435.6, Dx? = 1836, 
Sy? = 587.05 


b. —0.908 


c. Because r is close to —1, this suggests a strong negative 
linear correlation between years out of school and annual 
contribution. 


Sab. 0.750 


-16 -8 


0% 8 16 


g. Reject Hp. 


. There is enough evidence at the 5% level of significance 


to support the claim that the proportion of male high 
school students who smoke cigars is greater than the 
proportion of female high school students who smoke 
cigars. 
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6a. 7 


c. Because r is close to 1, this suggests a strong positive 
linear correlation between the salaries and the average 
attendances at home games. 


b. 0.01 — c. 0.875 

d. |r| ~ 0.908 > 0.875; The correlation is significant. 

e. There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 


between the number of years out of school and the annual 
contribution. 
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Ta. Hy: p =0; Hp #0 b. 0.01 ec 28 
d. —t) = —2.763, ty = 2.763; 
Rejection regions: t < —2.763, t > 2.763 
e. 5.995 f. Reject Ap. 


g. There is enough evidence at the 1% level of significance to 
conclude that there is a significant linear correlation 
between the salaries and average attendances at home 
games for the teams in Major League Baseball. 


Section 9.2 
la. n= 7, Dx = 88, Sy = 56.7, Sxy = 435.6, Sx” = 1836 
b. m © —0.379875; b © 12.8756 
c. y = —0.380x + 12.876 
2a. Enter the data. b. m © 189.038015; b © 13,497.9583 
c. y = 189.038x + 13,497.958 
3a. (1) y = 12.481(2) + 33.683 
(2) y = 12.481(3.32) + 33.683 
b. (1) 58.645 = (2) 75.120 
c. (1) 58.645 minutes (2) 75.120 minutes 


Section 9.3 
la. 0.979  b. 0.958 


ce. About 95.8% of the variation in the times is explained. 


About 4.2% of the variation is unexplained. 


2a. x a x 
x; | i di V-Si (yi — 3i)? 
15 | 26 | 28.386 | —2.386 5.692996 
20 | 32 | 35.411 | —3.411 11.634921 
20 | 38 | 35.411 2.589 6.702921 
30 | 56 | 49.461 6.539 42.758521 
40 | 54 | 63.511 | —9.511 90.459121 
45 | 78 | 70.536 | 7.464 55.711296 
50 | 80 | 77.561 2.439 5.948721 
60 | 88 | 91.611 | —3.611 13.039321 
S = 231.947818 
b.8 6.218 


d. The standard error of estimate of the weekly sales for a 
specific radio ad time is about $621.80. 


3a. n= 10, df. = 8, t, = 2.306, s, © 138.255 
b. 886.897 ¢. 364.088 
d. 522.809 < y < 1250.985 


e. You can be 95% confident that when the gross domestic 
product is $4 trillion, the carbon dioxide emissions will be 
between 522.809 and 1250.985 million metric tons. 


Section 9.4 


la. Enter the data. 
b. y = 46.385 + 0.540x,; — 4.897x5 


2ab. (1) } = 46.385 + 0.540(89) — 4.897(1) 


c. (1) } = 89.548 
d. (1) 90 


( 
(2) $ = 46.385 + 0.540(78) — 4.897(3) 
(3) } = 46.385 + 0.540(83) — 4.897(2) 
(2)$ = 73.814 (3) $= 81411 
(3) 81 


(2) 74 


CHAPTER 10 


Section 10.1 

1. : 
Tax preparation Expected 
method % of people frequency 
Accountant 25% 125 
By hand 20% 100 
Computer software 35% 175 
Friend/family 5% 25 
Tax preparation service 15% 75 


. The expected frequencies are 64, 80, 32, 56, 60, 48, 40, and 


20, all of which are at least 5. 


. Claimed distribution: 


Ages | Distribution 
0-9 16% 
10-19 20% 
20-29 8% 
30-39 14% 
40-49 15% 
50-59 12% 
60-69 10% 
70+ 5% 


A: The distribution of ages is as shown in table above. 


H,; The distribution of ages differs from the claimed 
distribution. (claim) 


- 0.05 d.7 
. X6 = 14.067; Rejection region: y* > 14.067 


f. 6.694 


uc} 


A 


5 


jot} +" > 7 
6\8 10 12 X16 18 


. Fail to reject Ap. 


. There is not enough evidence at the 5% level of 


significance to support the sociologist’s claim that the age 
distribution differs from the age distribution 10 years ago. 
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3a. The expected frequency for each category is 30, which is at 
least 5. 


b. Claimed distribution: 


Color Distribution 
Brown 16.6% 
Yellow 16.6% 
Red 16.6% 
Blue 16.6% 
Orange 16.6% 
Green 16.6% 


Hp: The distribution of colors is uniform, as shown in the 
table above. (claim) 


H,: The distribution of colors is not uniform. 


ce. 0.05 d.5 
e. x6 = 11.071; Rejection region: x? > 11.071 
f. 12.933 
A 
a: = 0.05 
+—+—}+—} | Bea 1? 
2 4 6 8 10% \I4 
1 = 12.933 
g. Reject A. 


h. There is enough evidence at the 5% level of significance to 
reject the claim that the distribution of different-colored 
candies in bags of peanut M&M’s is uniform. 


Section 10.2 
la. Marginal frequencies: Row 1: 180; Row 2: 120; Column 1: 74; 
Column 2: 162; Column 3: 28; Column 4: 36 
b. 300 


Cc. Ey = 44.4, E\.9 = 97.2, E,.3 = 16.8, E\4 _ 21.6, 
Fo4 = 29.6, Ey.9 — 64.8, En 3 = 11.2, Ey 4 = 14.4 


2a. Hp: Travel concern is independent of travel purpose. 
H,; Travel concern is dependent on travel purpose. (claim) 
b. 0.01 c. 3 
d. x = 11.345; Rejection region: y* > 11.345 
e. 8.158 


a=0.01 


7 


7? 
2 4 6 8\ 10x) 14 
1? = 8.158 


f. Fail to reject Hp. 
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g. There is not enough evidence at the 1% level of 
significance for the consultant to conclude that travel 
concern is dependent on travel purpose. 


3a. Hj: Whether or not a tax cut would influence an adult to 
purchase a hybrid vehicle is independent of age. 


H,: Whether or not a tax cut would influence an adult to 
purchase a hybrid vehicle is dependent on age. (claim) 


b. Enter the data. 
c. X65 = 9.210; Rejection region: x? > 9.210 
d. 15.306  e. Reject Hp. 


f. There is enough evidence at the 1% level of significance to 
conclude that whether or not a tax cut would influence an 
adult to purchase a hybrid vehicle is dependent on age. 


Section 10.3 
la. 0.05 ib. 2.45 
2a. 0.01 b. 18.31 


3a. Hp: of = 03; H,: of > 3 (claim) 
b. 0.01 ce. df.y = 24,df£.p = 19 
d. Fy = 2.92; Rejection region: F > 2.92 


e. 3.21 
A 


f. Reject Hp. 

g. There is enough evidence at the 1% level of significance to 
support the researcher’s claim that a specially treated 
intravenous solution decreases the variance of the time 
required for nutrients to enter the bloodstream. 

. Hy: 0, = a (claim); H,: 0, # a2 

» 0.01 « diy = 15,dfp = 21 

Fo = 3.43; Rejection region: F > 3.43 

148 f. Fail to reject Ap. 

. There is not enough evidence at the 1% level of 
significance to reject the biologist’s claim that the pH 
levels of the soil in the two geographic locations have 
equal standard deviations. 


noma & pb 


Section 10.4 


La. Ao: wi = b2 = Bs = Ma 
H,: At least one mean is different from the others. (claim) 
b. 0.05 ec dfn = 3,dfp = 14 
d. Fy = 3.34; Rejection region: F > 3.34 
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4.22 


Reject Hp. 


. There is enough evidence at the 5% level of significance 


for the analyst to conclude that there is a difference in the 
mean monthly sales among the sales regions. 


» Ay: fay = br = 3 = ba 


H,: At least one mean is different from the others. (claim) 


. Enter the data. 
. F = 1.34; P-value ~ 0.280 
. Fail to reject Hp. 


. There is not enough evidence at the 5% level of 


significance to conclude that there is a difference in the 
means of the GPAs. 


CHAPTER 11 


Section 11.1 


la. 


b. 


2. 


. 0.05 


Hp: median = 120; H,: median > 120 (claim) 
0.025 «23 d. 6 e 6 f. Reject Hp. 


There is enough evidence at the 2.5% level of significance 
to support the agency’s claim that the median number of 


days a home is on the market in its city is greater than 120. 
. Ho: median = 9.4 (claim); H,: median # 9.4 
. 0.10 


ce. 92 d. —1.645 
Fail to reject Hp. 


e. —0.94 


. There is not enough evidence at the 10% level of 


significance to reject the organization’s claim that 
the median age of automobiles in operation in the 
United States is 9.4 years. 


. Ho: The number of colds will not decrease. 


H,: The number of colds will decrease. (claim) 
ce 11 d.2 e2 ff. Reject A. 


g. There is enough evidence at the 5% level of significance 


to support the researcher’s claim that a new vaccine will 
decrease the number of colds in adults. 


Section 11.2 


la. 


Hh: There is no difference in the amounts of water 
repelled. 

H,: There is a difference in the amounts of water repelled. 
(claim) 


b. 


e. 


Cc. 


ge 


0.01 all d5 
No Repellent | Differ- | Absolute Signed 
repellent | applied ence value Rank | rank 
8 15 =f 7 11 -11 
7 12 —5 5 9 —9 
7 11 —4 4 75 75 
4 6 —2 2 3.5 —3.5 
6 6 0 0 
10 8 2 2 3.5 3.5 
9 8 1 1 15 1.5 
5 6 -1 1 1.5 Sale 
9 12 -3 3 55 =555: 
11 8 3 3 52 5.5 
8 14 -6 6 10 —10 
4 8 -4 4 75 = 7.5 
w, = 10.5 


Fail to reject Ap. 


. There is not enough evidence at the 1% level of 


significance for the quality control inspector to conclude 
that the spray-on water repellent is effective. 

Hy: There is no difference in the claims paid by the 
companies. 

H,; There is a difference in the claims paid by the 
companies. (claim) 


. 0.05 


—Zy = —1.96, z = 1.96; 
Rejection regions: z < —1.96, z > 1.96 


Ny = 12 and Ny = 12 
Ordered Ordered 
data Sample | Rank data Sample | Rank 
1.7 B 1 5.3 B 13 
1.8 B 2 5.6 B 14 
22 B 3 5.8 A 15 
25 A 4 6.0 A 16 
3.0 A 5.5 6.2 A 17. 
3.0 B 53 6.3 A 18 
3.4 B 7 6.5 A 19 
3.9 A 8 73 B 20 
4.1 B 9 74 A 21 
4.4 B 10 9.9 A 22 
4.5 A 11 10.6 A 23 
47 B 12 10.8 B 24 


R = 120.5 (or R = 179.5) 
—1.703 (or 1.703) 


. Fail to reject Hp. 


. There is not enough evidence at the 5% level of 


significance to conclude that there is a difference in the 
claims paid by the companies. 
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Section 11.3 


states. 


H,: There is a difference in the salaries in the three states. 


(claim) 
. 0.05 « 2 


. Xq = 5.991; Rejection region: x* > 5.991 
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la. H): There is no difference in the salaries in the three 


Ordered Ordered 
data State | Rank data State | Rank 
88.28 CA 1 99.70 CA 15 
88.80 PA 2 99.75 NY 16 
92.50 NY 3 99.95 CA 17 
93.10 NY 4 99.99 PA 18 
94.40 NY 5 100.55 PA 19 
95.15 PA 6 100.75 CA 20 
96.25 CA 7 101.20 CA 21 
97.25 PA 8 101.55 NY 22 
97.44 PA 9 101.97 NY 23 
97.50 CA 10.5 102.35 NY 24 
97.50 NY 10.5 103.20 CA 25 
97.89 NY 12 103.70 PA 26 
98.85 CA 13 110.45 PA ar 
99.20 PA 14 113.90 CA 28 


R, = 157.5 
Ry = 129 
R; = 119.5 
. 0.433 
A 
a= 0.05 
Let pt 


12 3 4 5 X 
H= 0.433 


. Fail to reject Ap. 


. There is not enough evidence at the 5% level of 


significance to conclude that the distributions of the 
veterinarians’ salaries in these three states are different. 


Section 11.4 

la. Ho: p, = 0; H,: p, # 0 (claim) 
b. 0.01 ce. 0.929 
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e. 


g. 


Male | Rank | Female | Rank d dad? 
25 3:5 20 1.5 2 4 
24 15 20 15 0 0 
24 1.5 22 3.0 -15 2.25 
25 3:5 23 4.0 —0.5 0.25 
27 5.0 26 5.0 0 0 
29 6.0 27 6.0 0 0 
30 7.0 30 7.0 0 0 

Dd? = 65 
Dd* = 65 
0.884  f. Fail to reject Ap. 


There is not enough evidence at the 1% level of 
significance to conclude that a significant correlation 
exists between the number of males and females who 
received doctoral degrees. 


Section 11.5 


la. 


PPP FP F PPPP FRE P F PP 
FFF PPP F PPP 


. 13 


e. 3,1,.1, 1,4, 2,1, 1,2,3,3,1,3 


. Hy: The sequence of genders is random. 


H,: The sequence of genders is not random. (claim) 


. 0.05 


ce. m, = number of F’s = 9 


ny = number of M’s = 6 


G = number of runs = 8 


. lower critical value = 4 


upper critical value = 13 


. 8 f. Fail to reject Ap. 


. There is not enough evidence at the 5% level of 


significance to support the claim that the sequence of 
genders is not random. 


. Ho: The sequence of weather conditions is random. 


H,: The sequence of weather conditions is not random. 
(claim) 


. 0.05 


ce. m, = number of N’s = 21 


ny = number of S’s = 10 
G = number of runs = 17 
+1.96 e. 1.03 f. Fail to reject Mh. 


g. There is not enough evidence at the 5% level of 


significance to support the claim that the sequence of 
weather conditions is not random. 
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APPENDIX A 
1. (1) 0.4857 


(2) z= £2.17 
2a. 


b. 0.4834 — ¢. 0.9834 


3a. 

| 

i} 

I 

i} 

i} 

: 

“216 

b. 0.4846 ce. : 0.9846 

4a. 


Pa ~ T 
-2,165. -1.35' 0 


b. 0.4848; 0.4115 — e. 0.0733 


APPENDIX C 
la. 


The points do not appear to be approximately linear. 


b. 39,860 is a possible outlier because it is far removed from 
the other entries in the data set. 


c. Because the points do not appear to be approximately 
linear and there is an outlier, you can conclude that the 
sample data do not come from a population that has a 
normal distribution. 
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Odd Answers 


CHAPTER 1 


Section 1.1 (page 6) 


1. 
3. 


23. 


25. 


27. 


29. 


31. 


33. 


35. 


A sample is a subset of a population. 


A parameter is a numerical description of a population 
characteristic. A statistic is a numerical description of a 
sample characteristic. 

False. A statistic is a numerical measure that describes a 
sample characteristic. 

True 

False. A population is the collection of all outcomes, 
responses, measurements, or counts that are of interest. 
Population, because it is a collection of the heights of all 
the players on the school’s basketball team. 


. Sample, because the collection of the 500 spectators is a 


subset of the population. 


. Sample, because the collection of the 20 patients is a 


subset of the population. 


. Population, because it is a collection of all the golfers’ 


scores in the tournament. 


Population, because it is a collection of all the U.S. 
presidents’ political parties. 


. Population: Parties of registered voters in Warren County 


Sample: Parties of Warren County voters who respond 
to online survey 


Population: Ages of adults in the United States who own 
cellular phones 

Sample: Ages of adults in the United States who own 
Samsung cellular phones 

Population: Collection of the responses of all adults in the 
United States 

Sample: Collection of the responses of the 1000 adults 
surveyed 

Population: Collection of the immunization status of all 
adults in the United States 

Sample: Collection of the immunization status of the 
1442 adults surveyed 

Population: Collection of the opinions of all registered 
voters 

Sample: Collection of the opinions of the 800 registered 
voters surveyed 

Population: Collection of the investor role of all women in 
the United States 

Sample: Collection of the investor role of the 546 US. 
women surveyed 

Population: Collection of the responses of all Fortune 
magazine’s top 100 best companies to work for 

Sample: Collection of the responses of the 85 companies 
who responded to the questionnaire 


Statistic. The value $68,000 is a numerical description of a 
sample of annual salaries. 
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37. 


39. 


41. 


43. 


45. 
47. 


49. 


Section 1.2 


1. 
3. 


Parameter. The 62 surviving passengers out of 97 total 
passengers is a numerical description of all of the 
passengers of the Hindenburg that survived. 

Statistic. 8% is a numerical description of a sample of 

computer users. 

Statistic. 44% is a numerical description of a sample of all 

people. 

The statement “more than 56% are the primary investors 

in their households” is an example of descriptive statistics. 

An inference drawn from the sample is that an association 

exists between U.S. women and being the primary 

investors in their households. 

Answers will vary. 

(a) An inference drawn from the sample is that senior 
citizens who live in Florida have better memories than 
senior citizens who do not live in Florida. 

(b) This inference may incorrectly imply that if you live in 
Florida you will have a better memory. 


Answers will vary. 


(page 13) 
Nominal and ordinal 


False. Data at the ordinal level can be qualitative or 
quantitative. 

False. More types of calculations can be performed with 
data at the interval level than with data at the nominal 
level. 


7. Qualitative, because telephone numbers are labels. 


9. Quantitative, because body temperatures are numerical 


11. 


13. 
15. 


17. 
19. 


21. 


23. 


25. 
29. 
31. 


measurements. 


Quantitative, because song lengths are numerical 
measurements. 


Qualitative, because player numbers are labels. 


Quantitative, because infant weights are numerical 
measurements. 


Qualitative, because the poll responses are attributes. 


Qualitative. Ordinal. Data can be arranged in order, but 
the differences between data entries make no sense. 


Qualitative. Nominal. No mathematical computations can 
be made and data are categorized by region. 


Qualitative. Ordinal. Data can be arranged in order, but 
the differences between data entries are not meaningful. 


Ordinal 27. Nominal 

(a) Interval (b) Nominal (c) Ratio (d) Ordinal 
An inherent zero is a zero that implies “none.” Answers 
will vary. 
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Section 1.3 (page 23) 


1. 


15. 


17. 


19. 


21. 


23. 


25. 


27. 
29. 


31. 


In an experiment, a treatment is applied to part of 

a population and responses are observed. In an 
observational study, a researcher measures characteristics 
of interest of a part of a population but does not change 
existing conditions. 


In a random sample, every member of the population has 
an equal chance of being selected. In a simple random 
sample, every possible sample of the same size has an 
equal chance of being selected. 


True 


False. Using stratified sampling guarantees that members 
of each group within a population will be sampled. 

False. A systematic sample is selected by ordering a 
population in some way and then selecting members of 
the population at regular intervals. 


. Use a census because all the patients are accessible and 


the number of patients is not too large. 


. Perform an experiment because you want to measure the 


effect of a treatment on the human digestive system. 


Use a simulation because the situation is impractical and 
dangerous to create in real life. 


(a) The experimental units are the 30- to 35-year-old 


females being given the treatment. The treatment is the 


new allergy drug. 


(b) A problem with the design is that there may be some 
bias on the part of the researcher if the researcher 
knows which patients were given the real drug. A way 
to eliminate this problem would be to make the study 
into a double-blind experiment. 


(c) The study would be a double-blind study if the 
researcher did not know which patients received the 
real drug or the placebo. 


Simple random sampling is used because each telephone 
number has an equal chance of being dialed, and all 
samples of 1400 phone numbers have an equal chance of 
being selected. The sample may be biased because only 
homes with telephones will be sampled. 

Convenience sampling is used because the students are 
chosen due to their convenience of location. Bias may 
enter into the sample because the students sampled may 
not be representative of the population of students. 
Simple random sampling is used because each customer 
has an equal chance of being contacted, and all samples 
of 580 customers have an equal chance of being selected. 
Stratified sampling is used because a sample is taken from 
each one-acre subplot. 

Answers will vary. 


Answers will vary. Sample answer: Treatment group: Jake, 


Maria, Lucy, Adam, Bridget, Vanessa, Rick, Dan, and Mary. 


Control group: Mike, Ron, Carlos, Steve, Susan, Kate, Pete, 
Judy, and Connie. A random number table is used. 


Census, because it is relatively easy to obtain the ages of 
the 115 residents. 


33. The question is biased because it already suggests that 
eating whole-grain foods improves your health. The 
question might be rewritten as “How does eating 
whole-grain foods affect your health?” 

35. The survey question is unbiased. 

37. The households sampled represent various locations, 
ethnic groups, and income brackets. Each of these 
variables is considered a stratum. Stratified sampling 
ensures that each segment of the population is 
represented. 

39. Observational studies may be referred to as natural 
experiments because they involve observing naturally 
occurring events that are not influenced by the study. 

41. (a) Advantage: Usually results in a savings in the survey 

cost. 
Disadvantage: There tends to be a lower response rate 
and this can introduce a bias into the sample. Only a 
certain segment of the population might respond. 

(b) Sampling technique: Convenience sampling 

43. If blinding is not used, then the placebo effect is more 
likely to occur. 

45. Both a randomized block design and a stratified sample 
split their members into groups based on similar 
characteristics. 


Section 1.3 Activity (page 26) 
1. Answers will vary. The list contains one number at least 
twice. 


2. The minimum is 1, the maximum is 731, and the number of 
samples is 8. Answers will vary. 


Uses and Abuses for Chapter 1 (page 27) 


1. Answers will vary. 2. Answers will vary. 


Review Answers for Chapter 1 (page 29) 
1. Population: Collection of the opinions of all U.S. adults 
about credit cards 


Sample: Collection of the opinions of the 1000 U.S. adults 
surveyed about credit cards 


3. Population: Collection of the average annual percentage 
rates of all credit cards 


Sample: Collection of the average annual percentage rates 
of the 39 credit cards sampled 


5. Parameter 7. Parameter 


9. The statement “the average annual percentage rate (APR) 
[charged by credit cards] is 12.83%” is an example of 
descriptive statistics. 


An inference drawn from the sample is that all credit cards 
have an annual percentage rate of 12.83%. 


11. Quantitative, because monthly salaries are numerical 
measurements. 


13. Quantitative, because ages are numerical measurements. 
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15. 


17. 


19. 


21. 


23. 


25. 


27. 
29. 


31. 


33. 


35. 


37. 
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Quantitative, because revenues are numerical 
measurements. 


Interval. The data can be ordered and meaningful differ- 
ences can be calculated, but it makes no sense to say that 
100 degrees is twice as hot as 50 degrees. 

Nominal. The data are qualitative and cannot be arranged 
in a meaningful order. 

Take a census because CEOs keep accurate records of 
charitable donations. 

Perform an experiment because you want to measure the 
effect of training dogs from animal shelters on inmates. 
The subjects could be split into male and female and then 
be randomly assigned to each of the five treatment groups. 
Answers will vary. 

Simple random sampling is used because random 
telephone numbers were generated and called. 

Cluster sampling is used because each community is 
considered a cluster and every pregnant woman in a 
selected community is surveyed. 

Stratified sampling is used because 25 students are 
randomly selected from each grade level. 

Telephone sampling samples only individuals who 

have telephones, who are available, and who are 

willing to respond. 

The selected communities may not be representative of the 
entire area. 


Chapter Quiz for Chapter 1 (page 31) 


1. 


2. 
3. 
4. 


5. 


6. 
7. 


Population: Collection of the prostate conditions of all men 
Sample: Collection of the prostate conditions of 20,000 
men in study 

(a) Statistic (b) Parameter  (c) Statistic 

(a) Qualitative (b) Quantitative 

(a) Ordinal, because badge numbers can be ordered and 


often indicate seniority of service, but no meaningful 
mathematical computation can be performed. 


(b) Ratio, because one data value can be expressed as 
a multiple of another. 

(c) Ordinal, because data can be arranged in order, but 
the differences between data entries make no sense. 

(d) Interval, because meaningful differences between 
entries can be calculated but a zero entry is not an 
inherent zero. 

(a) Perform an experiment because you want to measure 
the effect of a treatment on lead levels in adults. 

(b) Use a survey because it would be impossible to 
question everyone in the population. 

Randomized block design 

(a) Convenience sampling, because all of the people 
sampled are in one convenient location. 

(b) Systematic sampling, because every tenth machine 
part is sampled. 
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8. 


(c) Stratified sampling, because the population is first 
stratified and then a sample is collected from each 
stratum. 


Convenience sampling 


Real Statistics—Real Decisions for Chapter 1 (page 32) 


1. 


2. 


3. 


(a) Answers will vary. (b) Yes  (c) Use surveys. 
(d) You may take too large a percentage of your 
sample from a subgroup of the population that is 


relatively small. 

(a) Both, because questions will ask for demographics 
(qualitative) as well as cost (quantitative). 

(b) Gender, business/recreational: nominal 
Cost of ticket: ratio 
Comfort, safety: ordinal 

(d) Statistics 

(a) Answers will vary. Sample answer: Sample includes 
only members of the population with access 
to the Internet. 


(c) Sample 


(b) Answers will vary. 


CHAPTER 2 


Section 2.1 (page 47) 


1. 


13. 


15. 


Organizing the data into a frequency distribution may 
make patterns within the data more evident. Sometimes it 
is easier to identify patterns of a data set by looking at a 
graph of the frequency distribution. 


. Class limits determine which numbers can belong to each 


class. 


Class boundaries are the numbers that separate classes 
without forming gaps between them. 


. The sum of the relative frequencies must be 1 or 100% 


because it is the sum of all portions or percentages of the data. 


. False. Class width is the difference between lower or upper 


limits of consecutive classes. 


. False. An ogive is a graph that displays cumulative 


frequencies. 


. Class width = 8; Lower class limits: 9, 17, 25, 33, 41, 49, 57; 


Upper class limits: 16, 24, 32, 40, 48, 56, 64 


Class width = 15; Lower class limits: 17, 32, 47, 62, 77, 92, 
107, 122; Upper class limits: 31, 46, 61, 76, 91, 106, 121, 136 


(a) Class width = 11 


(b) and (c) Class Midpoint | Class boundaries 
20-30 25 19.5-30.5 
31-41 36 30.5-41.5 
42-52 47 41.5-52.5 
53-63 58 52.5-63.5 
64-74 69 63.5-74.5 
75-85 80 74.5-85.5 
86-96 91 85.5-96.5 
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17. 31. 
Frequency, Relative Cumulative Frequency, Mid- Relative Cumulative 
Class Midpoint | frequency frequency Class if point frequency frequency 
20-30 19 25 0.05 19 1000-2019 12 1509.5 0.5455 12 
31-41 43 36 0.12 62 2020-3039 3 2529.5 0.1364 15 
42-52 68 47 0.19 130 3040-4059 2 3549.5 0.0909 17 
53-63 69 58 0.19 199 4060-5079 3 4569.5 0.1364 20 
64-74 74 69 0.20 273 5080-6099 1 5589.5 0.0455 21 
75-85 68 80 0.19 341 6100-7119 1 6609.5 0.0455 22 
86-96 24 91 0.07 365 f 
f >> f =22 > WN zl 
> f = 365 ies = 1 
July Sales for The graph shows that most of 
19. (a) Number of classes = 7 (b) Least frequency ~ 10 Representatives the sales representatives at the 
(c) Greatest frequency ~ 300  (d) Class width = 10 a: company sold between $1000 
p 2 and $2019. (Answers will vary.) 
21. (a) 50~=— (b) 22.5-23.5 pounds s eA 
Bg 
a 
23. (a) 42 (b) 29.5 pounds (c)35  (d) 2 FY on 
25. (a) Class with greatest relative frequency: 8-9 inches 2 
Class with least relative frequency: 17-18 inches 1509.5. 3549.5 5589.5. 
. Sales (in dollars) 
(b) Greatest relative frequency ~ 0.195 
Least relative frequency ~ 0.005 33. 
(c) Approximately 0.01 Frequency, Mid- Relative Cumulative 
27. Class with greatest frequency: 29.5-32.5 Class iP point | frequency frequency 
Classes with least frequency: 11.5-14.5 and 38.5-41.5 91-318 5 304.5 0.1667 5 
29. Frequency, Relative Cumulative 313% . 332.5 0.1333 9 
Class Midpoint | frequency frequency ae ? 360.5 seus 
375-402 5 388.5 0.1667 17 
0-7 8 3.5 0.32 8 403-430 6 416.5 0.2000 23 
8-15 8 11.5 0.32 16 431-458 4 444.5 0.1333 27 
16-23 3 19.5 0.12 19 459-486 1 472.5 0.0333 28 
24-31 3 27.5 0.12 22 487-514 2 500.5 0.0667 30 
32-39 3 35.5 0.12 25 
f 
f = f = 30 x—=1 
Xf = 25 2 = 1 fs 
n 


Classes with greatest frequency: 0-7, 8-15 
Classes with least frequency: 16-23, 24-31, 32-39 


Reaction Times for Females 
A 


6+ 


= 


Frequency 


~) 
1 
T 


304.5 


= 


472.55 
500.5- 


T 
al 
= 


416.5- 


T 
Val 
3 
0 


360.5- 


oi 
a 
on 
Reaction time 
(in milliseconds) 


a) 
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The graph shows that the 
most frequent reaction 
times were between 403 and 
430 milliseconds. (Answers 


will vary.) 
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a Frequency, | Mid- | Relative | Cumulative 
Class if point | frequency frequency 
24-30 9 27 0.30 9 
31-37 8 34 0.27 17 
38-44 10 41 0.33 27 
45-51 2 48 0.07 29 
52-58 1 55 0.03 30 

= f = 30 >> Be 1 
n 
Gasoline Class with greatest relative 

Consumption frequency: 38-44 
= 035+ Class with least relative 
Bae frequency: 52-58 
= 0.20-+ 
2 0.15-+ 
© 0.10 
B 005 i = - 

A A 5 $ 8 


Highway fuel consumption 


(in miles per gallon) 


37. 
Frequency, Mid- Relative Cumulative 
Class point frequency frequency 
138-202 12 170 0.46 12 
203-267 6 235 0.23 18 
268-332 4 300 0.15 22 
333-397 1 365 0.04 23 
398-462 3 430 0.12 26 
x f = 26 > f =1 
n 
Triglyceride Levels Class with greatest relative 
» 050 t frequency: 138-202 
8 94o-/- Class with least relative 
3 0. 
£ ap frequency: 333-397 
£ o20+ 
5 = 
3B 010+ 
~ 
170 235 300 365 430 
Triglyceride level 
(in mg/dL) 
AS54 


as Relative Cumulative 
Class Frequency, f | frequency frequency 
52-55 3 0.125 3 
56-59 3 0.125 6 
60-63 9 0.375 15 
64-67 4 0.167 19 
68-71 4 0.167 23 
72-75 1 0.042 24 

f 
Xf = 24 y= 1 


Retirement Ages Location of the greatest 
> increase in frequency: 60-63 
12) 
5 
és 
‘8 
a 
& 
=} 
I 
i] 
Ss) 
515 59.5 67.5 75.5 
Age 
41. d A ; 
Frequency, | Mid- Relative Cumulative 
Class if point | frequency frequency 
47-S7 1 52 0.05 1 
58-68 1 63 0.05 2 
69-79 5 74 0.25 7 
80-90 8 85 0.40 15 
91-101 A 96 0.25 20 
f 
= —=] 
x f = 20 > N 
Exam Scores The graph shows that the 
‘i t most frequent exam scores 
were between 80 and 90. 
e (Answers will vary.) 
am 
41 52 63 74 85 96 107 
Score 
43. (a) 
Frequency, Relative Cumulative 
Class f Midpoint frequency frequency 
65-74 4 69.5 0.17 4 
75-84 7 79.5 0.29 11 
85-94 4 89.5 0.17 15 
95-104 ) 99.5 0.21 20 
105-114 3 109.5 0.13 23 
115-124 1 119.5 0.04 24 
7 
> f = 24 > a axl 
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(b) Pulse Rates (c) Pulse Rates 
al 5 Br 
12} 12} 
5 6+ 5 6+ 
5 = 
04> o4-> 
a a 
a+ a+ 
a eae manannany 
DADA DA DADATDADAAADA DH 
Pulse rate Pulse rate 
(d) Pulse Rates (e) Pulse Rates 
A 


Cumulative frequency 
a 


“Annan 
ka SeSRSLa 
Pulse rate ed 
Pulse rate 
45. Finishing Times of Marathon Runners 
Frequency 
Tt 


6+ 


G + 1 t ; t t + 
157 169 181 193 205 217 229 241 
Finishing time (in minutes) 


Finishing Times of Marathon Runners 


Relative frequency 
0.25 +4 
0.274 
0.15 + 
0.174 


0.05 + 


157 169 181 193 205 217 229 241 
Finishing time (in minutes) 


47. (a) Daily Withdrawals 


A 
355 


Relative frequency 
essssss 
ee NNW 
oNoNso 
1 1 1 J j 


05 5 


a 


Amount (in hundreds 
of dollars) 


(b) 16.7%, because the sum of the relative frequencies for 
the last three classes is 0.167. 


(c) $9600, because the sum of the relative frequencies for 
the last two classes is 0.10. 


Section 2.2 (page 60) 
1. Quantitative: stem-and-leaf plot, dot plot, histogram, 
scatter plot, time series chart 
Qualitative: pie chart, Pareto chart 


3. Both the stem-and-leaf plot and the dot plot allow you to 
see how data are distributed, to determine specific data 
entries, and to identify unusual data values. 


5. b 6. d Ta 8.c 


9. 27, 32, 41, 43, 43, 44, 47, 47, 48, 50, 51, 51, 52, 53, 53, 53, 54, 

54, 54, 54, 55, 56, 56, 58, 59, 68, 68, 68, 73, 78, 78, 85 
Max: 85; Min: 27 

11. 13, 13, 14, 14, 14, 15, 15, 15, 15, 15, 16, 17, 17, 18, 19 
Max: 19; Min: 13 

13. Answers will vary. Sample answer: Users spend the most 
amount of time on MySpace and the least amount of time 
on Twitter. 

15. Answers will vary. Sample answer: Tailgaters irk drivers 


the most, and too-cautious drivers irk drivers the least. 
17. Key: 6|7 = 67 

6|78 

7135569 

8}/002355778 

9101112455 


It appears that most grades for the biology midterm were 
in the 80s or 90s. (Answers will vary.) 


19. Key: 4|3 = 4.3 


4|39 
5/18889 
61}48999 
7/002225 
8101 


It appears that most ice had a thickness of 5.8 centimeters 
to 7.2 centimeters. (Answers will vary.) 


21. Systolic Blood Pressures 


‘ dishes eoS eo 8 
ee ee 


T T T T T T T T T 
100 110 120 130 140 150 160170 180 190200 
Systolic blood pressure (in mmHg) 


It appears that systolic blood pressure tends to be between 
120 and 150 millimeters of mercury. (Answers will vary.) 


23. Marathon Winners’ Most of the New York City 
Countries of Origin Marathon winners are from 
United the United States and Kenya. 
States Mexico i 
375% (Answers will vary.) 
Great 
Britain Italy 
2.5% 10% 
New Zealand 


2.5% 
Ethiopia 2.5% 
Tanzania 2.5% 


South Africa 
5% 
Morocco 2.5% 
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25. Barrel of Oil It appears that the largest 33. (a) Investment Focus 2010 
Pui —_ portion of a 42-gallon 
2 40+ — barrel of crude oil is 
2 307 ~_ used for making gasoline. penseccoun 


Investment 


(Answers will vary.) Hi Bank accounts, 32.32% 


& 207 1 
0+] H hae — 

cu, LI] ee HBonds, 10.1% 

$9 S80 FE CO FE Bh , 16.16% 

éa°  g gear BUS. stocks, 35.35% 

a S “as 
Distribution 
27. Hourly Wages It appears that there is no It appears a large portion of adults said that the type of 

A : : A 

a 14.00 -L relation between wages and investment that they would focus on in 2010 was USS. 
e . 

3 13.004 hours worked. (Answers stocks or bank accounts. (Answers will vary.) 
cv . 
& 1200+ = will vary.) (b) Investment Focus 2010 
% 11.00-+ aia Number of adults 
a z e 
Es 
ey 
= 
= 
° 
x 


> 


10.00-+—* —» —* 
9.00 $ 10000 + ; 
25 30 35 40 45 50 
Number of hours 8000 +— 
29. Daily High Temperatures [t appears that it was hottest 6000 +— 
in May from May 7th to May 11th. 
al (Answers will vary.) 4000 4 
= 86 
82 2000 + _ 
: [ai 
784 
76 o_o = _ a_i 


ess 

& te 

¢ 1 

& 

Rol | 

(3) | 

5 

g | 

3) 

Qa | 

E64 

RSH EEE 
2 4 6 8 10 1 U.S stocks Bank accounts Emerging markets Bonds Commodities 
Day of the month Investment 


It appears that most adults said that the type of 
investment that they would focus on in 2010 was 
US. stocks or bank accounts. (Answers will vary.) 


31. Variable: Scores 
Decimal point is 1 digit(s) to the right of the colon. 


5:5 35. (a) The graph is misleading because the large gap from 0 
6:2 to 90 makes it appear that the sales for the 3rd quarter 
6:8 are disproportionately larger than the other quarters. 
7:01 (Answers will vary.) 
1:56 (b) al for Company A 
8:023 # ot 
8:567889 E05 
2o 70+ 
9:03 3 ag 
& 507 
9:5589 2 304 
s 
10:0 = 104 


3rd 2nd Ist 4th 
Quarter 


It appears that most scores on the final exam in economics 


were in the 80s and 90s. (Answers will vary. 
( ¥) 37. (a) The graph is misleading because the angle makes it 


appear as though the 3rd quarter had a larger percent 
of sales than the others, when the 1st and 3rd quarters 
have the same percent. 
(b) Sales for Company B 
4th 


quarter 
20% 


Ist 
quarter 
38% 


3rd 2nd 
quarter quarter 
38% 4% 
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(a) At Law Firm A, the lowest salary was $90,000 and the 
highest salary was $203,000; at Law Firm B, the lowest 
salary was $90,000 and the highest salary was $190,000. 

(b) There are 30 lawyers at Law Firm A and 32 lawyers at 
Law Firm B. 

(c) At Law Firm A, the salaries tend to be clustered at the 
far ends of the distribution range and at Law Firm B, 
the salaries tend to fall in the middle of the distribution 
range. 


Section 2.3 (page 72) 


25. 


27. 


29. 


31. 
33. 
35. 


37. 


39. 
41. 
49. 


. True 


3. True 5. 1,2, 2,2,3 (Answers will vary.) 


- 2,5,7,9,35 (Answers will vary.) 
. The shape of the distribution is skewed right because the 


bars have a “tail” to the right. 


. The shape of the distribution is uniform because the bars 


are approximately the same height. 


. (11), because the distribution of values ranges from 1 to 12 


and has (approximately) equal frequencies. 


. (12), because the distribution has a maximum value of 90 


and is skewed left due to a few students scoring much 
lower than the majority of the students. 


. x © 4,9; median = 5; mode = 4 


. x © 11.0; median = 11.0; mode = 11.7; The mode does not 


represent the center of the data because 11.7 is the largest 
number in the data set. 


. xX © 21.46; median = 21.95; mode = 20.4 


. X = not possible; median = not possible; 


mode = “Eyeglasses”; The mean and median cannot 
be found because the data are at the nominal level of 
measurement. 


x ~ 170.63; median = 169.3; mode = none; There is no 
mode because no data point is repeated. 


X = 168.7; median = 162.5; mode = 125;The mode does 
not represent the center of the data because 125 is the 
smallest number in the data set. 


xX © 14.11; median = 14.25; mode = 2.5; The mode does 
not represent the center of the data because 2.5 is much 
smaller than most of the data in the set. 


X © 29.82; median = 32; mode = 24, 35 
x © 19.5; median = 20; mode = 15 
The data are skewed right. 


A = mode, because it is the data entry that occurred most 
often. 


B = median, because the median is to the left of the mean 
in a skewed right distribution. 


C = mean, because the mean is to the right of the median 
in a skewed right distribution. 


Mode, because the data are at the nominal level of 
measurement. 


Mean, because there are no outliers. 
89 43. $612.73 45. 2.8 47. 87 
36.2 miles per gallon 51. 35.8 years old 


53. 


Class Frequency, f | Midpoint 
127-161 9 144 
162-196 8 179 
197-231 3 214 
232-266 3 249 
267-301 1 284 
x f = 24 
Hospital Beds Positively skewed 


Frequency 
, 


T T T T T 
144 179 214 249 284 


Number of beds 
55. eee 
Class | Frequency, f | Midpoint 
62-64 3 63 
65-67 7 66 
68-70 9 69 
71-73 8 72 
74-16 3 75 
> f = 30 
Heights of Males Symmetric 
9 A 
8+ —_—, 
p> I- el 
3 6+ 
4+ 
eS 
a+ 
y+ 
63 66 69 72 75 
Height 
(to the nearest inch) 
57. (a) x = 6.005 (b) x = 5.945 
median = 6.01 median = 6.01 
(c) Mean 
59. Summary statistics: 
Column n Mean Median | Min | Max 
Amount (in dollars) | 11 | 112.11364 105.25 79 1515 


61. (a) x = 358, median = 375 
(b) x = 1074, median = 1125 


(c) The mean and median in part (b) are three times the 
mean and median in part (a). 


(d) If you multiply the mean and median from part (b) by 
12, you will get the mean and median of the data set in 
inches. 


63. Car A, because the midrange is the largest. 
65. (a) 49.2 


(b) x © 49.2; median = 46.5; mode = 36, 37, 51; 
midrange = 50.5 
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(c) Using the trimmed mean eliminates potential outliers 
that could affect the mean of the entries. 


Section 2.3 Activity (page 79) 


1. 


N 


The distribution is symmetric. The mean and median both 
decrease slightly. Over time, the median will decrease 
dramatically and the mean will also decrease, but to a 
lesser degree. 


Neither the mean nor the median can be any of the points 
that were plotted. Because there are 10 points in each 
output region, the mean will fall somewhere between the 
two regions. By the same logic, the median will be the 
average of the greatest point between 0 and 0.75 and the 
least point between 20 and 25. 


Section 2.4 (page 90) 


1. 


» 


Nes 


11. 
13 
15. 
19 
21. 


23. 


25. 


The range is the difference between the maximum and 
minimum values of a data set. The advantage of the range 
is that it is easy to calculate. The disadvantage is that it uses 
only two entries from the data set. 


The units of variance are squared. Its units are meaningless. 
(Example: dollars”) 


. {9,9,9, 9,9, 9, 9} 


. When calculating the population standard deviation, you 


divide the sum of the squared deviations by N, then take 
the square root of that value. When calculating the sample 
standard deviation, you divide the sum of the squared 
deviations by n — 1, then take the square root of that value. 


Similarity: Both estimate proportions of the data 
contained within k standard deviations of the mean. 


Difference: The Empirical Rule assumes the distribution 
is bell-shaped; Chebychev’s Theorem makes no such 
assumption. 


Range = 7, = 9,0° = 48,0 © 2.2 

Range = 15,¥ = 12,s* © 21,5 ~ 4.6 

73 17. 24 

(a) Range = 17.8 (b) Range = 39.8 

The data set in (a) has a standard deviation of 24 and the 


data set in (b) has a standard deviation of 16, because the 
data in (a) have more variability. 


Company B; An offer of $33,000 is two standard deviations 
from the mean of Company A’s starting salaries, which 
makes it unlikely. The same offer is within one standard 
deviation of the mean of Company B’s starting salaries, 
which makes the offer likely. 
(a) Dallas: ¥ ~ 44.28; median = 44.7; range = 11.3; 
s’ © 18.33; 9 = 4.28 
New York City: ¥ © 50.91; median = 50.6; 
range = 17.8; 5? = 50.36; s © 7.10 
(b) It appears from the data that the annual salaries in New 
York City are more variable than the annual salaries in 
Dallas. The annual salaries in Dallas have a lower mean 


and a lower median than the annual salaries in New 
York City. 
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27. (a) Males: x © 1643; median = 1679.5; range = 1087; 
s* © 116,477.4; s » 341.3 


Females: ¥ ~ 1709.1; median = 1686.5; range = 947; 


s’ & 91,625.0; 5 = 302.7 


(b) It appears from the data that the SAT scores for males 
are more variable than the SAT scores for females. The 
SAT scores for males have a lower mean and median 


than the SAT scores for females. 
29. (a) Greatest sample standard deviation: (ii) 


Data set (ii) has more entries that are farther away 
from the mean. 


Least sample standard deviation: (iii) 


Data set (iii) has more entries that are close to the 
mean. 


(b) The three data sets have the same mean but have 
different standard deviations. 


31. (a) Greatest sample standard deviation: (ii) 


Data set (ii) has more entries that are farther away 
from the mean. 


Least sample standard deviation: (iti) 


Data set (iii) has more entries that are close to the 
mean. 


(b) The three data sets have the same mean, median, and 


mode but have different standard deviations. 
33. 68% 35. (a) 51s (b) 17 


37. $2180, $1000, $2000, $950; $2180 is very unusual because it 


is more than 3 standard deviations from the mean. 


39, 24 4.x + 21,5713 
43. Class | Midpoint, x if af 
13 2 3 6 
4-6 5 6 30 
7-9 8 13 104 
10-12 ial 7 77 
13-15 14 3 42 
N =32 | Sxf = 259 
x= | =p)? (x — )’f 
~6.1 37.21 111.63 
—3.1 9.61 57.66 
-0.1 0.01 0.13 
2.9 8.41 58.87 
5.9 34.81 104.43 
D(x — w) f = 332.72 
wes 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


45. 
Midpoint, 
x if xf x—x|@-z) | @-x)F 
70.5 1 70.5 —44 1936 1936 
92.5 12 1110.0 = 22 484 5808 
114.5 25 2862.5 0 0 0 
136.5 10 1365.0 22 484 4840 
158.5 2 317.0 44 1936 3872 
n=50) Sxf = 5725 d(x — ¥)/f = 16,456 
x = 1145 
s © 18.33 
ae Class Midpoint, x i xf 
0-4 2.0 22.1 44.20 
5-14 95 43.4 412.30 
15-19 17.0 212. 360.40 
20-24 22.0 22.3 490.60 
25-34 29.5 44.5 1312.75 
35-44 39.5 41.3 1631.35 
45-64 54.5 83.9 4572.55 
65+ 70.0 46.8 3276.00 
n = 325.5 Xx f = 12,100.15 
x-¥ | @-x? | @-H*s 
=35.17 1236.93 27,336.15 
—27.67 765.63 33,228.34 
—20.17 406.83 8624.80 
=15:17 230.13 5131.90 
—7.67 58.83 2617.94 
2.33 5.43 224.26 
17.33 300.33 25,197.69 
32.83 1077.81 50,441.51 
d(x — ¥)/f = 152,802.59 
x © 37.17 
s © 21.70 
49. Summary statistics: 
Column n | Mean | Variance 
Amount (in dollars) | 15 58.8 239.74286 
Std. Dev. | Median | Range | Min | Max 
15.483632 60 59 30 89 
3.29 
51. CV heights = 72.75" 100% & 4.5% 
17.69 
CV weights = 187.83. 100% ~ 9.4% 


It appears that weight is more variable than height. 
53. (a) x © 41.5,5 = 5.3 

(b) X © 43.6, 5 © 5.6 

(c) 3.5,5 ~ 0.4 


v 


| 
2 


55. 


57. 


(d) When each entry is multiplied by a constant k, the new 
sample mean is k+ x, and the new sample standard 
deviation is k-s. 


(a) Males: 249, Females: 245.4; The mean absolute 
deviation is less than the sample standard deviation. 


(b) Team A: 0.0315, Team B: 0.0199; The mean absolute 
deviation is less than the sample standard deviation. 


(a) P ~ —2.61 

The data are skewed left. 
(b) P = 4.12 

The data are skewed right. 
(c) P=0 

The data are symmetric. 
(d) P=1 

The data are skewed right. 


Section 2.4 Activity (page 98) 


1. 


When a point with a value of 15 is added, the mean 
remains constant and the standard deviation decreases; 
When a point with a value of 20 is added, the mean is 
raised and the standard deviation increases. (Answers will 
vary.) 


. To get the largest standard deviation, plot four of the 


points at 30 and four of the points at 40; To get the smallest 
standard deviation, plot all of the points at the same 
number. 


Section 2.5 (page 107) 


1. 


17. 


19. 


The soccer team scored fewer points per game than 75% of 
the teams in the league. 


. The student scored higher than 78% of the students who 


took the actuarial exam. 


. The interquartile range of a data set can be used to identify 


outliers because data values that are greater than 
Q3 + 1.5(1OR) or less than Q; — 1.5(IOR) are considered 
outliers. 


. False. The median of a data set is a fractile, but the mean 


may or may not be a fractile depending on the distribution 
of the data. 


. True 
. False. The 50th percentile is equivalent to Q). 

. False. A z-score of —2.5 is considered unusual. 
- (a) Min = 10, Q; = 13, Qo = 15, Q3 


17, Max = 20 


(b) IOR = 4 


(a) Min = 900, Q; = 1250, Q2 = 1500, Q3 = 1950, 
Max = 2100 


(b) IQR = 700 

(a) Min = -1.9, Q; = —0.5, Q, = 0.1, Q3 = 0.7, 
Max = 2.1 

(b) IOR = 1.2 
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21. 


23. 


25. 
27. 


29. 


31. 


33. 


35. 
37. 


39. 


41. 
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(a) Min = 24, O, = 28, QO, = 35,Q3 = 41, Max = 60 
(b) 
2428 35 41 60 
Zo 25 30 35 40 45 50 55 60 
(a) Min = 1, Q; = 4.5, Q2 = 6, Q3 = 7.5, Max = 9 


(b) 


i 4 
T t 
012345 67 


~< 


None. The data are not skewed or symmetric. 


Skewed left. Most of the data lie to the right on the 

box plot. 

QO, = B, Q. = A, Q3 = C, because about one quarter of 
the data fall on or below 17, 18.5 is the median of the 
entire data set, and about three quarters of the data fall on 
or below 20. 


(a) O; = 2,Q, = 4,0; =5 
(b) Watching Television 


0 2 45 9 


~< 


t= t a a ae nal 

0123456789 
Number of hours 

(a) Q; = 3, Qy = 3.85, Qs = 5.2 

(b) Airplane Distances 


18 3 385 52 6 
a ee a 
Distance (in miles) 
(a) 5 (b) 50% ~=— (c) 25% 
A>z = -143 
B>z=0 
Cz =2.14 
A z-score of 2.14 would be unusual. 
(a) Statistics: z = Bee = o = 1.71 
: 233 
Biology: z = 39 ~*~ 0.51 
(b) The student did better on the statistics test. 
(a) Statistics: z = ca a me = 2.14 
: es ee Se 
Biology: z = = 1.54 


(b) The student did better on the statistics test. 
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43. 


34,000 — 35,000 


(a) 3 = ET = —0.44 
37,000 — 35,000 
o= rT = 0.89 
30,000 — 35,000 
3 = rT x —2.22 


The tire with a life span of 30,000 miles has an 
unusually short life span. 


(b) For 30,500, 2.5th percentile 
For 37,250, 84th percentile 
For 35,000, 50th percentile 


45. 72 inches; 60% of the heights are below 72 inches. 
_ 74-699 
47. 7, = 30 = 1.37 
_ 62 — 69.9 
6 em 3.0 = —2.63 
80 — 69.9 
Bay. 3.37 
The heights of 62 and 80 inches are unusual. 
71.1 — 69.9 
49. z= eq. 0.4 
About the 50th percentile 
51. (a) Min = 27, Q, = 42, Q) = 49, Q; = 56, Max = 82 


53. 
57. 


59. 


(b) Ages of Executives 


on 


27 = 42.49 56 82 
A a Se eS (ee a (aad 
25 35 45 55 65 75 8 
Age 


(c) Half of the executives are between 42 and 56 years old. 

(d) 49, because half of the executives are older and half 
are younger. 

(e) The age groups 20-29, 70-79, and 80-89 would all 


be considered unusual because they lie more than two 
standard deviations from the mean. 


33.75 55. 19.8 
Credit Card Purchases 
Friend: o—o__¢ 40 


75 102.5 136 159 190 


You: - eet 


28 83 115 143 215 


at et 
0 25 50 75 100 125 150 175 200 225 
Monthly purchases (in dollars) 


~ 


> 


The shape of your bill is symmetric, and the shape of your 
friend’s bill is uniform. 


40th percentile 
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61. (a) 62,95 


8 JA 


62 7273 75 7980 95 


ji 


SS | 
60 65 70 75 80 85 90 95 


63. (a) Summary statisti 


cs: 
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Column 


Min 


Q1 


Median 


Q3 


Max 


Weight (in pounds) 


165 


230 


262.5 


294 


395 


(b) Weights of Professional Football Players 


+ + + 
200 250 300 350 400 
Weight (in pounds) 


(c) Weights of Professional Football Players 


200 250 300 
Weight (in pounds) 


350 400 


Uses and Abuses for Chapter 2 (page 113) 


1. Answers will vary. 


2. No, it is not ethical because it misleads the consumer 
to believe that oatmeal is more effective at lowering 


cholesterol than it may actually be. 


Review Answers for Chapter 2 (page 115) 


aerAtInAMnN FF WN RF 


1. 
Mid- Frequency, | Relative | Cumulative 
Class | point | Boundaries if frequency | frequency 
8-12 10 7.5-12.5 2 0.10 2 
13-17 15 12.5-17.5 10 0.50 12 
18-22 20 17.5-22.5 5 0.25 17 
23-27 25 22.5-27.5 1 0.05 18 
28-32 30 27.5-32.5 2 0.10 20 
f 
BPH 20) So=1 


Frequency 
cs 
t 


nN 
stt+t 
2.115] 
Y 


Liquid Volume 12-0z Cans 


A 
8 


Actual volume (in ounces) 


Class Midpoint | Frequency, f 

79-93 86 9 
94-108 101 12 
109-123 116 5 
124-138 131 3 
139-153 146 2 
154-168 161 1 

Df = 32 


Number of stories 


Rooms Reserved 


Number of rooms 


00 
00255 
034558 
124478 
23379 
1 tS 
1.3 


9 
Heights of Buildings 


Key: 


1|0 = 10 


The number of stories 
appears to increase with 
height. 


t—}+—+—_} +} > 
500 600 700 800 900 1000 


Height (in feet) 
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11. Location at Midnight 
on New Year’s Day 
ef 
= 700+- 
3 600++ 
v3 500+ 
> 400-- 
a 
2 300-+ 
2 200+ 
100 
Log aad. Glo eee 
uw os na a 
5 2 BESs a4 
= 2 243 a6 
2 & 2 3 
<z8 ie. 8 
= 3 5 
Location 


13. x = 29.15; median = 29.5; mode = 29.5 
15. 17.8 17. 82.1 19. Skewed 21. Skewed left 


23. Median; When a distribution is skewed left, the mean is to 
the left of the median. 


25. $2.80 27. w= 6.9,0 = 4.6 
29. x = 2453.4, 5 = 306.1 

31. Between $41.50 and $56.50 
35. X © 2.5,5 = 1.2 

37. Min = 42, Q; = 47.5,Q> = 53,03 = 54, Max = 60 


39. Motorcycle Fuel Economies 


33. 30 customers 


o_o 


42 47.5 53 54 60 


SSS = 
40 42 44 46 48 50 52 54 56 58 60 
Fuel economy (in highway 
miles per gallon) 


41. 4.5 
45. Not unusual 


43. 35% scored higher than 75. 
47. Unusual 


Chapter Quiz for Chapter 2 (page 119) 


1. (a) Class Midpoint | Class boundaries 
101-112 106.5 100.5-112.5 
113-124 118.5 112.5-124.5 
125-136 130.5 124.5-136.5 
137-148 142.5 136.5-148.5 
149-160 154.5 148.5-160.5 
Frequency, Relative Cumulative 

frequency frequency 

3 0.12 3 

cI 0.44 14 

ff 0.28 21 

2 0.08 23 

Z 0.08 25 

A62 


6. (a) z = 3.0, unusual 


(b) Frequency histogram 
and polygon 


Weekly Exercise 


Frequency 


Number of minutes 


(d) Skewed 

(c) 10} 18 
11)/1467899 
12};00334778 
13/112599 
14 
15107 

(f) Weekly Exercise 


e—_9# @_e____e 


101 117.5 123 131.5 157 


——— ee 


+—_+—_+—+ t—t 
100 110 120 130 140 150 160 
Number of minutes 


. 125.2, 13.0 
. (a) U.S. Sporting Goods 


Recreational transport 


33.88% 
Clothing 
13.30% 
Footwear 
21.58% 
Equipment 
31.24% 


(c) Relative frequency 
histogram 


Weekly Exercise 
A 


0.16 +- 


106.5- 
118.55 
130.57 
142.5- 
154.54 


Number of minutes 


Key: 10|8 = 108 


( g) Weekly Exercise 


won 
ow 


Cumulative frequency 


Number of minutes 


(b) 


USS. Sporting Goods 
A 


36+ 
30+ 
244 
18+ 
12+ 

6+ 


Sales 
(in billions of dollars) 


1a au ij ba —a 
25 5 6 § 
SS 8 & & 
si ¢ 8 & 
eb 2 g 0 
3 a 
% 

Sales area 


. (a) X © 751.6; median = 784.5; mode = none 


The mean best describes a typical salary because there 


are no outliers. 


(b) Range = 575; s* © 48,135.1; s ~ 219.4 


. Between $125,000 and $185,000 
(b) z © —6.67, very unusual 


83.5, O; = 88, Max = 103 


103 


(c) z= 1.33 (d) z = —2.2, unusual 
. (a) Min = 59, Q; = 74, Q> 
(b) 14 
(c) Wins for Each Team 
e + ++ 
59 74 83.5 88 


SS 
60 65 70 75 80 85 90 95 1 


Number of wins 
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Real Statistics—Real Decisions for Chapter 2 (page 120) 3. Reason for Baggage Delay 
1. (a) Find the average cost of renting an apartment for each st 
area and do a comparison. 2357 
(b) The mean would best represent the data sets for the Brot 
four areas of the city. TRE 
5+ i =, 


(c) Area A: ¥ = $1005.50 
Area B: x = $887.00 
Area C: x = $881.00 
Area D: x = $945.50 


2. (a) Construct a Pareto chart, because the data are 


quantitative and a Pareto chart positions data in order 
of decreasing height, with the tallest bar positioned at 
the left. 


ggage 
dling 


iS 
n 


Failure to load 
g error 

° 

g! 

€ 


Ba 


misha: 
restriction 


Arrival station 
mishandling 
loadin, 


Reason for delay 


4. Parameter. All Major League Baseball players are included. 


. Statistic. The 19% is a numerical description of the 1000 


voters surveyed in the United States. 


b Cost of Monthly : % 
(b) Pp etiaeen 6. (a) 95% ~—(b) 38 
ped _ 90,500 — 83,500 _ ia 
Se ssa (©) a1 1500 os 
ZS 900+ 79,750 — 83,500 
se 850+ Za = = 
is 800 -+ 1500 
<a ao = Be 82,600 — 83,500 a 
2244 ; 1500 
Area The salaries of $90,500 and $79,750 are unusual. 
(c) Yes. From the Pareto chart you can see that Area A has 7. Population: Collection of the career interests of all college 


the highest average cost of monthly rent, followed by 
Area D, Area B, and Area C. 


3. (a) You could use the range and sample standard 


deviation for each area. 
(b) Area A 

s © $123.07 

range = $415.00 


Area B 
s = $144.91 
range = $421.00 


and university students 


Sample: Collection of the career interests of the 195 college 
and university students whose career counselors were 
surveyed 


. Population: Collection of the life spans of all people 


Sample: Collection of the life spans of the 232,606 people 
in the study 


9. Census. There are only 100 members in the Senate. 
iakiee een 2 10. Experiment. An experiment could compare a control group 
s = $146.21 s © $138.70 that has recess and a treatment group that has recess 


range = $460.00 range = $497.00 


(c) No. Area A has the lowest range and standard 
deviation, so the rents in Areas B—D are more spread 
out. There could be one or two inexpensive rents that 
lower the means for these areas. It is possible that the 
population means of Areas B—D are close to the 
populations mean of Area A. 


4. (a) Answers will vary. 


(b) Location, weather, population 


removed. 


- Quantitative. The data are at the ratio level. 
. Qualitative. The data are at the nominal level. 
13. 


(a) Min = 0,Q, = 2,Q, = 12.5, Q; = 39, Max = 136 
(b) Number of Tornadoes by State 


02125 39 136 


<—t t t t t t t t 
0 20 40 60 80 100 120 140 
Number of tornadoes 


= 


Cumulative Review Answers for Chapters 1-2 (page 124) (c) The distribution of the number of tornadoes is skewed 


right. 
14. 88.9 


1. Systematic sampling. A bias may enter this study if the 
machine makes a consistent error. 


2. Random sampling. A bias of this type of study is that the 
researchers did not include people without telephones. 
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15. 


16. 


17. 


18. 
19. 
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(a) x © 5.49; median = 5.4; mode = none; Both the mean 
and the median accurately describe a typical American 
alligator tail length. (Answers will vary.) 


(b) Range = 4.1; s* = 2.34; 5 = 1.53; The maximum 
difference in alligator tail lengths is about 4.1 feet, and 
about 68% of alligator tail lengths will fall between 
3.96 feet and 7.02 feet. 


(a) An inference drawn from the sample is that the 
number of deaths due to heart disease for women will 
continue to decrease. 


(b) This inference may incorrectly imply that women will 
have less of a chance of dying of heart disease in the 


future. 
Class 
Class boundaries | Midpoint 
0-8 —0.5-8.5 4 
9-17 8.5-17.5 13 
18-26 17.5-26.5 22 
27-35 26.5-35.5 31 
36-44 35.5-44.5 40 
45-53 44.5-53.5 49 
54-62 53.5-62.5 58 
63-71 62.5-71.5 67 
Frequency, Relative Cumulative 
ij frequency frequency 
8 0.27 8 
5 0.17 13 
7 0.23 20 
3 0.10 23 
4 0.13 27 
1 0.03 28 
0 0.00 28 
2 0.07 30 
> f = 30 > pe 1 
n 
The distribution is skewed right. 
Montreal Canadiens 
Points Scored 
A 
2 0.30-++ 
5 0.25 4 
30.204 
os 
20.104 
% 0.054 
x 


mber of points 
scored (per player) 


Class with greatest frequency: 0-8 
Class with least frequency: 54-62 
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CHAPTER 3 


Section 3.1 (page 138) 


1. 


17. 


19. 


21. 


23. 


25. 


35. 


37. 
45. 
47. 


49. 


.b 12. d 
15. 


An outcome is the result of a single trial in a probability 
experiment, whereas an event is a set of one or more 
outcomes. 


. The probability of an event cannot exceed 100%. 


. The law of large numbers states that as an experiment is 


repeated over and over, the probabilities found in the 
experiment will approach the actual probabilities of the 
event. Examples will vary. 


. False. If you roll a six-sided die six times, the probability of 


rolling an even number at least once is approximately 
0.984. 


. False. A probability of less than 0.05 indicates an unusual 


event. 
13. c 14. a 
{A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, 
W, X, Y, Z}; 26 
{Ay, Ky, Ov, Jy, 10¥, 9, 8y, 7y¥, 64, 5¥, 4y, 3, 29, 
Ae, Ke, Q¢, Jo, 104, 94, 8, 74, 64,54, 44, 34, 26, 
Aa, Ka, Qa, Ja, 104, 94, 84, 74, 64, 54, 44, 34, 24, 
Am, Ke, Om, Je, 10m, Ime, 8e%, 7h, Oo, Soe, 4m, Bee, 2%}; 
52 


O 


{(A, ); (A, ). (B, | ); (B, ); (AB, +), (AB, =)s 


(O, +), (O, —)}, where (A, +) represents positive 
Rh-factor with blood type A and (A, —) represents 
negative Rh-factor with blood type A; 8. 


1; Simple event because it is an event that consists of a 
single outcome. 


4; Not a simple event because it is an event that consists of 
more than a single outcome. 


204 27. 4500 29. 0.083 31. 0.667 33. 0.417 


Empirical probability because company records were used 
to calculate the frequency of a washing machine breaking 
down. 


0.159 39. 0.000953 41. 0.042; Yes 
(a) 1000 (b) 0.001 (c) 0.999 

{(SSS), (SSR), (SRS), (SRR), (RSS), (RSR), 
(RRS), (RRR)} 

{(SSR), (SRS), (RSS)} 


43. 0.208; No 
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51. (a) gS ssss 
as] L_R SSSR 
-—S SSRS 
R-L_& ssrr 
s_[—S! srss 
ls. L_R SRSR 
os SRRS 
L_R SRRR 
g_{ 7S! Rsss 
as L_R RSSR 
-—S RSRS 
R-L_R RSRR 
s_[—S RRss 
| L_R RRSR 
—- RRRS 
L_R RRRR 
(b) {(SSSS), (SSSR), (SSRS), (SSRR), (SRSS), (SRSR), 
(SRRS), (SRRR), (RSSS), (RSSR), (RSRS), 
(RSRR), (RRSS), (RRSR), (RRRS), (RRRR)} 
(c) {(SSSR), (SSRS), (SRSS), (RSSS) } 
53. 0.399 55. 0.040 57. 0.936 59. 0.033 61. 0.275 


63. Yes; The event in Exercise 55 can be considered unusual 
because its probability is 0.05 or less. 


65. (a) 0.5 (b) 0.25 (c) 0.25 
67. 0.795 69. 0.205 
71. (a) 0.225 (b) 0.133 


(c) 0.017; This event is unusual because its probability is 
0.05 or less. 


73. The probability of randomly choosing a tea drinker who 
does not have a college degree 


75. (a) 


Suny epronsbility (b) Answers will vary. 


(c) Answers will vary. 
0.028 


0.056 
0.083 
0.111 
0.139 
0.167 
0.139 
0.111 
0.083 
0.056 
0.028 


omArnNI aN FW YN 


PRR 
a 


77. The first game; The probability of winning the second game 
is 4 = 0.091, which is less than ix 


79. 13:39 = 1:3 
81. p = number of successful outcomes 

q = number of unsuccessful outcomes 
number of successful outcomes Pp 
—ptaq 


P(A) = 
(4) total number of outcomes 


Section 3.1 Activity (page 144) 


1-2. Answers will vary. 


Section 3.2 (page 150) 


1. Two events are independent if the occurrence of one of the 
events does not affect the probability of the occurrence of 
the other event, whereas two events are dependent if the 
occurrence of one of the events does affect the probability 
of the occurrence of the other event. 


3. The notation P(B|A) means the probability of B, given A. 
5. False. If two events are independent, then P(A|B) = P(A). 


7. Independent. The outcome of the first draw does not affect 
the outcome of the second draw. 


9. Dependent. The outcome of a father having hazel eyes 
affects the outcome of a daughter having hazel eyes. 


11. Dependent. The sum of the rolls depends on which numbers 
came up on the first and second rolls. 


13. Events: moderate to severe sleep apnea, high blood 
pressure; Dependent. People with moderate to severe 
sleep apnea are more likely to have high blood pressure. 


15. Events: exposure to aluminum, Alzheimer’s disease; 
Independent. Exposure to everyday sources of aluminum 
does not cause Alzheimer’s disease. 


17. (a) 0.6 (b) 0.001 
(c) Dependent. 
P(developing breast cancer| gene) 
# (developing breast cancer) 
19. (a) 0.308 (b) 0.788 (c) 0.757 
(ec) Dependent. 


(d) 0.596 


P(taking a summer vacation| family owns a computer) 
# P(taking a summer vacation) 
21. (a) 0.093 (b) 0.75 


(c) No, the probability is not unusual because it is not less 
than or equal to 0.05. 


23. 0.745 
25. (a) 0.017 (b) 0.757 (c) 0.243 


(d) The event in part (a) is unusual because its probability 
is less than or equal to 0.05. 


27. (a) 0.481 (b) 0.465 = (c) 0.449 

(d) Dependent. 

P(having less than one month’s income saved|being male) 

# P(having less than one month’s income saved) 

29. (a) 0.00000590 (b) 0.624 (c) 0.376 
31. (a) 0.25 (b) 0.063 = (c) 0.000977 

(d) 0.237 (e) 0.763 
33. (a) 0.011 = (b) 0.458 
37. 0.167 39. (a) 0.074 


35. 0.444 


(b) 0.999 41. 0.954 


Section 3.3 (page 161) 


1. P(A and B) = 0 because A and B cannot occur at the 
same time. 


3. True 
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vr 


False. The probability that event A or event B will occur is 
P(A or B) = P(A) + P(B) — P(A and B). 

Not mutually exclusive. A student can be an athlete and on 
the Dean’s list. 

9. Not mutually exclusive. A public school teacher can be 
female and 25 years old. 


a 


11. Mutually exclusive. A student cannot have a birthday in 
both months. 


13. (a) Not mutually exclusive. For five weeks the events 
overlapped. 
(b) 0.423 
15. (a) Not mutually exclusive. A carton can have a puncture 
and a smashed corner. 


(b) 0.126 


17. (a) 0.308 (b) 0.538 — (c) 0.308 
19. (a) 0.067 (b) 0.839 (c) 0.199 
21. (a) 0.949 (b) 0.388 

23. (a) 0.573 (b) 0.962 (c) 0.573 


(d) Not mutually exclusive. A male can be a nursing major. 


25. (a) 0.461 (b) 0.762 (c) 0.589 (d) 0.922 


(e) Not mutually exclusive. A female can be frequently 
involved in charity work. 


27. Answers will vary. 29. 0.55 


Section 3.3 Activity (page 166) 


1. 0.333 


3. The theoretical probability is 0.5, so the green line should 
be placed there. 


2. Answers will vary. 


Section 3.4 (page 174) 


1. The number of ordered arrangements of n objects taken r 
at a time. An example of a permutation is the number of 
seating arrangements of you and three of your friends. 


3. False. A permutation is an ordered arrangement of objects. 
5. True 7. 15,120 9. 56 11. 203,490 13. 0.030 
15. Permutation. The order of the eight cars in line matters. 


17. Combination. The order does not matter because the 
position of one captain is the same as the other. 


19. 5040 21. 720 23. 20,358,520 25. 320,089,770 
27. 50,400 29. 6240 31. 86,296,950 
33. (a) 720 (b) sample 


(c) 0.0014; Yes, the event can be considered unusual 
because its probability is less than or equal to 0.05. 
35. (a) 12 (b) tree 
(c) 0.083; No, the event cannot be considered unusual 
because its probability is not less than or equal to 0.05. 
37. (a) 907,200 
(c) 0.000001; Yes, the event can be considered unusual 
because its probability is less than or equal to 0.05. 


39. 0.005 41. (a) 0.016 = (b) 0.385 


(b) population 
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43. (a) 70 (b) 16 
45. (a) 67,600,000 


(c) 0.086 


(b) 19,656,000 (c) 0.000000015 


47. (a) 120 (b) 12 (c) 12 (d) 04 

49. 0.000022 51. 6.00 x 107 

53. (a) 658,008  (b) 0.00000152 

55. (a) 0.0002 (b) 0.0014 (c) 0.0211 (d) 0.0659 
57. 1001; 1000 

59. Team (worst 1 2 3 4 5 


team first) 


Probability 0.250 | 0.199 | 0.156 | 0.119 | 0.088 


Team (worst 
team first) 


Probability 0.063 | 0.043 ] 0.028 | 0.017 | 0.011 


Team (worst 


team first) 1. 12 13 14 


Probability 0.008 | 0.007 | 0.006 | 0.005 


Events in which any of Teams 7-14 win the first pick would 
be considered unusual because the probabilities are all less 
than or equal to 0.05. 


61. 0.314 


Uses and Abuses for Chapter 3 (page 179) 
1. (a) 0.000001 (b) 0.001 (c) 0.001 


2. The probability that a randomly chosen person owns a 
pickup or an SUV can equal 0.55 if no one in the town 
owns both a pickup and an SUV. The probability cannot 
equal 0.60 because 0.60 > 0.25 + 0.30. (Answers will vary.) 


Review Exercises for Chapter 3 (page 181) 


1. Sample space: 
{HHHH, HHHT, HHTH, HHTT, HTHH, HTHT, HTTH, 
HTTT, THHH, THHT, THTH, THTT, TTHH, TTHT, 
TTITH, TITT}; 4 
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3. Sample space: 
{January, February, March, April, May, June, July, August, 
September, October, November, December}; 3 


5. 84 


7. Empirical probability because it is based on observations 
obtained from probability experiments. 


9. Subjective probability because it is based on opinion. 


11. Classical probability because all of the outcomes in the 
event and the sample space can be counted. 


13. 0.215 15. 1.25 x 107 17. 0.92 


19. Independent. The outcomes of the first four coin tosses do 
not affect the outcome of the fifth coin toss. 


21. Dependent. The outcome of getting high grades affects the 
outcome of being awarded an academic scholarship. 


23. 0.025; Yes, the event is unusual because its probability is 
less than or equal to 0.05. 


25. Mutually exclusive. A jelly bean cannot be both completely 
red and completely yellow. 


27. Mutually exclusive. A person cannot be registered to vote 
in more than one state. 


29. 0.60 31. 0.538 33. 0.583 35. 0.291 

37. 0.188 39. 0.703 41. 110 43. 35 

45. 254,251,200 47. 2730 49. 2380 

51. 0.00000923; unusual 

53. (a) 0.955; not unusual = (b) 0.000000761; unusual 
(c) 0.045; unusual (d) 0.999999239; not unusual 

55. (a) 0.071; not unusual  (b) 0.005; unusual 
(c) 0.429; not unusual  (d) 0.114; not unusual 


Chapter Quiz for Chapter 3 (page 185) 


1. (a) 0.523 (b) 0.508 (c) 0.545. (d) 0.772 
(e) 0.025 (f) 0.673 (g) 0.094 (h) 0.574 


2. The event in part (e) is unusual because its probability is 
less than or equal to 0.05. 


3. Not mutually exclusive. A golfer can score the best round 
in a four-round tournament and still lose the tournament. 


Dependent. One event can affect the occurrence of the 
second event. 


4. (a) 2,481,115  (b) 1 (c) 2,572,999 
5. (a) 0.964 —(b) 0.000000389 — (c) 0.9999996 
6. 450,000 7. 657,720 


Real Statistics-—Real Decisions for Chapter 3 (page 186) 


1. (a) Answers will vary. 


(b) Use the Multiplication Rule, Fundamental Counting 
Principle, and combinations. 


2. 


If you played only the red ball, the probability of matching 
it is ee However, because you must pick five white balls, 
you must get the white balls wrong. So, using the 
Multiplication Rule, you get 


P(matching only the red 
ball and not matching any + : x . ty : 2 . - . 2 
of the five white balls) 


| 


v 


0.016 


wl 
™~ 62° 


. The overall probability of winning a prize is determined by 


calculating the number of ways to win and dividing by the 
total number of outcomes. 

To calculate the number of ways to win something, you 
must use combinations. 


CHAPTER 4 


Section 4.1 (page 197) 


1. 


21. 


A random variable represents a numerical value associated 
with each outcome of a probability experiment. 

Examples: Answers will vary. 

No; Expected value may not be a possible value of x for 
one trial, but it represents the average value of x over a 
large number of trials. 

False. In most applications, discrete random variables 
represent counted data, while continuous random variables 
represent measured data. 

True 

Discrete; Attendance is a random variable that is countable. 
Continuous; Distance traveled is a random variable that 
must be measured. 

Discrete; The number of books in a library is a random 
variable that is countable. 

Continuous; The volume of blood drawn for a blood test is 
a random variable that must be measured. 


. Discrete; The number of messages posted each month on a 


social networking site is a random variable that is countable. 


Continuous; The amount of snow that fell in Nome, Alaska 
last winter is a random variable that cannot be counted. 


(a) 0.35 (b) 0.90 23.0.22 25. Yes 


a b) Dogs per Household 
© ( 2 
A 
0 0.686 9.70-- 
0.65 + 
1 0.195 0.60-+ 
0.554 
2 0.077 050+ 
3 0.022 2 aol 
2 035+ 
4 0.013 2 030} 
5 0.006 oT 
0.20 
si 015+ 
DP(x) = 1 0.10-+ 


Number of dogs 


Skewed right 
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(c) 0.5, 0.8, 0.9 

(d) The mean is 0.5, so the average number of dogs per 
household is about 0 or 1 dog. The standard deviation 
is 0.9,so most of the households differ from the mean 
by no more than about 1 dog. 


29. (a (b Televisions per 
(a) a P(x) ) Household 
0 | 0.01 ae 
1 | 0.17 ~ 
2 | 0.28 Soa 
fe} 
3 0.54 5 0.3 
E 02 
0.1 
oT 
0 1 2 3 
Number of televisions 
Skewed left 
(c) 2.4, 0.6, 0.8 


(d) The mean is 2.4, so the average household in the town 
has about 2 televisions. The standard deviation is 0.8, 
so most of the households differ from the mean by 
no more than about 1 television. 


31. (a b Overtime 
(a) x P(x) (b) P(x) 
0 0.031 os04 
1 0.063 B 0.257 
g 0.151 all 
i B 0.15 
3 0.297 E 010+ 
4 0.219 ol - 
ee Sererr ee 
6 0.083 aleaiats 
overtime hours 
P(x) =1 Approximately symmetric 
(c) 3.4, 2.1, 1.5 


(d) The mean is 3.4, so the average employee worked 
3.4 hours of overtime. The standard deviation is 1.5, 
so the overtime worked by most of the employees 
differed from the mean by no more than 1.5 hours. 


33. An expected value of 0 means that the money gained is 
equal to the money spent, representing the break-even 
point. 

35. (a) 5.3 (b) 3.3 (c) 18 = (d) 53 
(e) The expected value is 5.3, so an average student is 

expected to answer about 5 questions correctly. The 
standard deviation is 1.8, so most of the students’ quiz 
results differ from the expected value by no more than 
about 2 questions. 

37. (a) 2.0 (b)1.0 (c)1.0 (d)2.0 
(e) The expected value is 2.0, so an average hurricane that 

hits the U.S. mainland is expected to be a category 2 
hurricane. The standard deviation is 1.0, so most of the 
hurricanes differ from the expected value by no more 
than 1 category level. 
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39. (a) 2.5 (b)19 (c)14 (d)25 
(e) The expected value is 2.5, so an average household is 
expected to have either 2 or 3 people. The standard 
deviation is 1.4, so most of the household sizes differ 
from the expected value by no more than 1 or 2 
people. 
41. (a) 0.881 (b) 0.314 = (c) 0.294 


43. A household with three dogs is unusual because the 
probability of this event is 0.022, which is less than 0.05. 


45. —$0.05 


47. (a) 7 P(x) 
0 0.432 
1 0.403 
2 0.137 
3 0.029 
SP(x) © 1 
Computers per Household 
Probability 
0.4 
0.3 
0.2 
0.1 
0 


0 1 2 3 
Number of computers 


(b) Skewed right 
49. $38,800 51. 3020; 28 


Section 4.2 (page 211) 


1. Each trial is independent of the other trials if the outcome 
of one trial does not affect the outcome of any of the 
other trials. 


3. (a) p=050 (b) p=0.20 (c) p = 0.80 
5. (a)n=12 (b)n=4 (c)n=8 
As n increases, the distribution becomes more symmetric. 
7. (a) x = 0,1,2,3,4, 11,12 
(b) x =0 
(c) x = 0,1,2,8 
9. Binomial experiment 
Success: baby recovers 
n= 5, p = 0.80, q = 0.20, x 


11. Binomial experiment 


0, 1, 2,3, 4,5 


Success: selecting an officer who is postponing or reducing 
the amount of vacation 


n = 20, p = 0.31, q = 0.69, x = 0, 1,2, ..., 20 
13. 20,12,3.5 15. 32.2,23.9,4.9 
17. (a) 0.088 (b) 0.104 (c) 0.896 
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19. (a) 0.111 (b) 0.152  (c) 0.848 
21. (a) 0.257 (b) 0.220 (c) 0.780 
23. (a) 0.187 (b) 0.605 (c) 0.084 
25. (a) 0.255 (b) 0.562 = (c) 0.783 
27. (a) n= 6, p = 0.63 (b) Visiting the Dentist 
P(x) 
2 Pix) ost 
0 | 0.003 eal 
1 | 0.026 3 0.20-+ 
2 | on 2 alk 
3 0.253 0.05 -- 
ann La a aa a es 
4 0.323 0123456 
5 0.220 Number of adults 
6 0.063 Skewed left 
(c) 3.8, 1.4, 1.2 
(d) On average, 3.8 out of 6 adults are visiting the dentist 
less because of the economy. The standard deviation 
is 1.2,so most samples of 6 adults would differ from 
the mean by no more than 1.2 people. The values 
x = 0 and x = 1 would be unusual because their 
probabilities are less than 0.05. 
29. (a) n = 4, p = 0.05 (b) Donating Blood 
P(x) 
x P(x) Geek 
0 | 0.814506 ti 
1 | 0.171475 Ba 
S 04+ 
2 0.013538 E 03+ 
3 | 0.000475 call 
4 | 0.000006 ae as i ae gma 
Number of adults 
Skewed right 
(c) 0.2, 0.2, 0.4 
(d) On average, 0.2 eligible adult out of every 4 gives 


31. (a) n = 6, p = 0.37 


blood. The standard deviation is 0.4, so most samples 
of four eligible adults would differ from the mean by 
at most 0.4 adult. 

x = 2,3, and 4 would be unusual because their 
probabilities are less than 0.05. 


(b) 0.323 (c) 0.029 


P(x) 
0.063 
0.220 
0.323 
0.253 
0.112 
0.026 
0.003 


Nn FB WN FF CO] & 


33. 2.2,1.2 


On average, 2.2 out of 6 travelers would name “crying 
kids” as the most annoying. The standard deviation is 1.2, 
so most samples of 6 travelers would differ from the mean 
by at most 1.2 travelers. The values x = 5 and x = 6 would 
be unusual because their probabilities are less than 0.05. 


35. 


37. 


(a) 0.081  (b) 0.541 


(c) 0.022; This event is unusual because its probability is 
less than 0.05. 


0.033 


4.2 Activity (page 216) 


1-3. Answers will vary. 


Section 4.3 (page 222) 


1. 


9. 


11. 


13. 


15. 
17. 


19. 


21. 
23. 


25. 


27. 


29. 


0.080 3. 0.062 5. 0.175 7. 0.251 


In a binomial distribution, the value of x represents the 
number of successes in v trials, and in a geometric 
distribution the value of x represents the first trial that 
results in a success. 


Geometric. You are interested in counting the number of 
trials until the first success. 

Binomial. You are interested in counting the number of 
successes out of 7 trials. 


(a) 0.082 (b) 0.469 (c) 0.531 
(a) 0.195  (b) 0.434 (Tech: 0.433) 
(c) 0.566 (Tech: 0.567) 

(a) 0.329 (b) 0.878  (c) 0.122 
(a) 0.105 (b) 0.578  (c) 0.316 
(a) 0.140 


(b) 0.042; This event is unusual because its probability is 
less than 0.05. 


(c) 0.064 

(a) 0.1254235482 

(b) 0.1254084986; The results are approximately the same. 
(a) 1000, 999,000, 999.5 


On average you would have to play 1000 times in 
order to win the lottery. The standard deviation is 
999.5 times. 


(b) 1000 times 


Lose money. On average you would win $500 once in 
every 1000 times you play the lottery. So, the net gain 
would be —$500. 


(a) 3.9, 2.0; The standard deviation is 2.0 strokes, so most 
of Phil’s scores per hole differ from the mean by no 
more than 2.0 strokes. 


(b) 0.385 


Uses and Abuses for Chapter 4 (page 225) 


1. 


40, 0.081 2. 0.739; Answers will vary. 


3. The probability of finding 36 adults out of 100 who prefer 


Brand A is 0.059. So, the manufacturer’s claim is believable 
because 0.059 > 0.05. 


The probability of finding 25 adults out of 100 who prefer 
Brand A is 0.000627. So, the manufacturer’s claim is not 
believable. 
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Review Answers for Chapter 4 (page 227) 


1. Continuous; The length of time spent sleeping is a random 
variable that cannot be counted. 


3. Discrete 5. Continuous 7. No, =P(x) #1. 9. Yes 


11. (a) 


(b) 


(c) 
(d) 


13. (a) 


(b) 


A70 


x i P(x) 
2 3 0.005 
3 12 0.018 
4 72 0.111 
5 115 0.177 
6 169 0.260 
7 120 0.185 
8 83 0.128 
9 48 0.074 
10 22 0.034 
11 6 0.009 
n= 650 | SP(x) ~1 


Pages per Section 
P(x) 

A 
0.284 
0.24+ 
0.20+ 
0.16 + 
012+ 
0.08 + 
0.04 + 


Probability 


Ba 
234567891011 


Number of pages 
Approximately symmetric 


6.4, 2.9, 1.7 


The mean is 6.4, so the average number of pages per 
section is about 6 pages. The standard deviation is 1.7, 
so most of the sections differ from the mean by no 
more than about 2 pages. 


x oi P(x) 
0 2) 0.020 
1 35 0.140 
2 68 0.272 
3 73 0.292 
4 42 0.168 
5 19 0.076 
6 8 0.032 
n= 250 | 2P(x) =1 
Cellular Phones 
per Household 
P(x) 
A 
0.32-- 
0.28 ++ 
> 0.24 
0.20 
"3 0.16+ 
[e) 
E£ 012+ 
0.08 -- 
0.04 -+ 
i= >x 


aS 
012345 6 


Number of 
cellular phones 


Approximately symmetric 


(c) 2.8, 1.7, 1.3 
(d) The mean is 2.8, so the average number of cellular 
phones per household is about 3. The standard 
deviation is 1.3,so most of the households differ 
from the mean by no more than about 1 cellular phone. 
15. 3.4 


17. No; In a binomial experiment, there are only two possible 
outcomes: success or failure. 


19. Yes;n = 12, p = 0.24, q = 0.76, x = 0,1,...,12 
21. (a) 0.208 (b) 0.322 (Tech: 0.321) (c) 0.114 
23. (a) 0.196 (b) 0.332 (c) 0.137 


25. (a) = Pl) (b) r ra with Chores 
0 0.125 0.354 
1 | 0.323 eral 
2 | 0.332 B 020+ 
2.) O71 eral 
4 0.044 ous 
5_| 0.005 12545. 
Number of women 
Skewed right 


(c) 1.7, 1.1, 1.1; The mean is 1.7, so an average of 1.7 out 
of 5 women have spouses who never help with 
household chores. The standard deviation is 1.1, so 
most samples of 5 women differ from the mean by 
no more than 1.1 women. 


(d) The values x = 4 and x = 5 are unusual because their 
probabilities are less than 0.05. 


27. (a) - P(x) (b) be Diesel Engines 
0 | 0.130 0354 
1 | 0.346 ell 
2 | 0.346 5 020+ 
8 015 
3 | 0.154 ca eerall 
4 | 0.026 0.05 + 
i > xX 


Oo 1 2 3 4 
Number of trucks sold 


Skewed right 
(c) 1.6, 1.0, 1.0; The mean is 1.6, so an average of 1.6 out of 
4 trucks have diesel engines. The standard deviation is 
1.0, so most samples of 4 trucks differ from the mean 
by no more than 1 truck. 


(d) The value x = 4 is unusual because its probability is 
less than 0.05. 


29. (a) 0.134 (b) 0.186 (c) 0.176 
31. (a) 0.765 (b) 0.205 (c) 0.997 (d) 0.030; unusual 


33. The probability increases as the rate increases, and 
decreases as the rate decreases. 


Chapter Quiz for Chapter 4 (page 231) 


1. (a) Discrete; The number of lightning strikes that occur 
in Wyoming during the month of June is a random 
variable that is countable. 
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2. (a) 


4. (a) 0.175 
5. 0.038; Yes, this event is unusual because 0.038 < 0.05. 
6. 0.335; No, this event is not unusual because 0.335 > 0.05. 
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(b) Continuous; The fuel (in gallons) used by the Space 
Shuttle during takeoff is a random variable that has 
an infinite number of possible outcomes and cannot 


be counted. 
¥ Gf P(x) 
1 114 0.400 
2 74 0.260 
3 76 0.267 
4 18 0.063 
5 3 0.011 

n = 285 YP(x) 1 
(b) Hurricane Intensity 
P(x) 

0.40 
0.35 

2 030 

% 0.25 

2 0.20 

S 0.15 

& 0.10 
0.05 

T T T T T >X 
123 4 5 
Intensity 
Skewed right 
(c) 2.0, 1.0, 1.0 


On average, the intensity of a hurricane will be 2.0. The 
standard deviation is 1.0, so most hurricane intensities 
will differ from the mean by no more than 1.0. 


(d) 0.074 


3. (a) [ Px) (b) Successful Surgeries 
P(x) 
0 0.00001 aac: 
1 0.00039 ee Dea 
2 030+ 
2 | 0.00549 Be 025-- 
2 0.20+ 
3 0.04145 8 oust | 
% 010+ 
4 0.17618 eal 
A 0.39933 Ss Sn SS a 
0123456 
6 0.37715 Number of patients 
Skewed left 


(c) 5.1, 0.8, 0.9; The average number of successful surgeries 
is 5.1 out of 6. The standard deviation is 0.9, so most 
samples of 6 surgeries differ from the mean by no 
more than 0.9 surgery. 

(d) 0.041; Yes, this event is unusual because 0.041 < 0.05. 

(e) 0.047; Yes, this event is unusual because 0.047 < 0.05. 


(b) 0.440 (c) 0.007 


Real Statistics-Real Decisions for Chapter 4 (page 232) 


1. 


2. 


3. 


(a) Answers will vary. For instance, calculate the 
probability of obtaining 0 clinical pregnancies out 
of 10 randomly selected ART cycles. 

(b) Binomial. The distribution is discrete because the 
number of clinical pregnancies is countable. 


n = 10, p = 0.349, P(0) = 0.014 


P(x) 
0.01367 
0.07329 
0.17681 
0.25277 
0.23714 
0.15256 
0.06815 
0.02088 
0.00420 
0.00050 
0.00003 


OMANI HDN FPWN-E COC! & 


me 
oOo 


Answers will vary. Sample answer: Because P(0) = 0.014, 
this event is unusual but not impossible. 


(a) Suspicious, because the probability is very small. 


(b) Not suspicious, because the probability is not that 
small. 


CHAPTER 5 


Section 5.1 (page 244) 


1. 
3. 
5. 


Answers will vary. 

1 

Answers will vary. 

Similarities: The two curves will have the same line of 
symmetry. 


Differences: The curve with the larger standard deviation 
will be more spread out than the curve with the smaller 
standard deviation. 


7 w=0,0=1 


. “The” standard normal distribution is used to describe one 


specific normal distribution (u = 0,0 = 1). “A” normal 
distribution is used to describe a normal distribution with 
any mean and standard deviation. 


. No, the graph crosses the x-axis. 


. Yes, the graph fulfills the properties of the normal 


distribution. 


. No, the graph is skewed right. 


. It is normal because it is bell-shaped and nearly symmetric. 


. 0.0968 21. 0.0228 23. 0.4878 25. 0.5319 
~ 0.005 29. 0.7422 31. 0.6387 33. 0.4979 
. 0.95 37. 0.2006 (Tech: 0.2005) 
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39. (a) Life Spans of Tires —_It is reasonable to assume 
, that the life spans are 
6 normally distributed 
sy because the histogram is 


symmetric and bell-shaped. 


Frequency 


“WB 


T 
a 
Sa) 
~ 
Ss 
+ 


27,660 5 
32,353 
37,046 5 
41,739 5 


Distance (in miles) 


(b) 37,234.7, 6259.2 


(c) The sample mean of 37,234.7 hours is less than the 
claimed mean, so, on average, the tires in the sample 
lasted for a shorter time. The sample standard deviation 
of 6259.2 is greater than the claimed standard deviation, 
so the tires in the sample had a greater variation in life 
span than the manufacturer’s claim. 

41. (a) A = 105; B = 113; C = 121; D = 127 

(b) —2.78; —0.56; 1.67; 3.33 

(c) x = 105 is unusual because its corresponding z-score 
(—2.78) lies more than 2 standard deviations from 
the mean, and x = 127 is very unusual because its 


corresponding z-score (3.33) lies more than 
3 standard deviations from the mean. 


43. (a) A = 1241; B = 1392; C = 1924; D = 2202 
(b) —0.86; —0.375; 1.33; 2.22 


(c) x = 2202 is unusual because its corresponding z-score 
(2.22) lies more than 2 standard deviations from the 


ma 


mean. 
45. 0.9750 47. 0.9775 49. 0.84 51. 0.9265 
53. 0.0148 55. 0.3133 57. 0.901 (Tech: 0.9011) 


59. 0.0098 (Tech: 0.0099) 


The normal distribution curve 
is centered at its mean (60) 

and has 2 points of inflection 
(48 and 72) representing w + o. 


63. (a) Area under curve = area of square = (1)(1) = 1 
(b) 0.25 (c) 0.4 


Section 5.2 (page 252) 


1. 0.4207 3. 0.3446 
7. 0.3442 (Tech: 0.3451) 
11. 0.3387 (Tech: 0.3385) 
13. (a) 0.0968 (b) 0.6612 (c) 0.2420 


(d) No, none of the events are unusual because their 
probabilities are greater than 0.05. 


5. 0.1787 (Tech: 0.1788) 
9. 0.2747 (Tech: 0.2737) 
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15. (a) 0.1867 (Tech: 0.1870) 
(c) 0.0166 (Tech: 0.0167) 


(d) Yes, the event in part (c) is unusual because its 
probability is less than 0.05. 


17. (a) 0.0228 (b) 0.927  (c) 0.0013 

19. (a) 0.0073 (b) 0.7215 (Tech: 0.7218) 

21. (a) 83.15% (Tech: 83.25%) 
(b) 305 scores (Tech: 304 scores) 

23. (a) 66.28% (Tech: 66.4%)  (b) 22 men 

25. (a) 99.87%  (b) 1 adult 

27. 1.5% (Tech: 1.51%); It is unusual for a battery to have 
a life span that is more than 2065 hours because the 
probability is less than 0.05. 

29. (a) 0.3085 (b) 0.1499 


(c) 0.0668; No, because 0.0668 > 0.05, this event is not 
unusual. 


(b) 0.4171 (Tech: 0.4176) 


(c) 0.0228 


31. Out of control, because there is a point more than three 
standard deviations beyond the mean. 

33. Out of control, because there are nine consecutive points 
below the mean, and two out of three consecutive points lie 
more than two standard deviations from the mean. 


Section 5.3 (page 262) 


1. —0.81 3. 2.39 5. —1.645 71.555 9. —1.04 
11. 1.175 13. —0.67 15. 0.67 17. —0.38 

19. —0.58 21. —1.645, 1.645 23. —1.18 25. 1.18 
27. —1.28, 1.28 29. —0.06, 0.06 


31. (a) 68.58 inches  (b) 62.56 inches (Tech: 62.55 inches) 
33. (a) 161.72 days (Tech: 161.73 days) 
(b) 221.22 days (Tech: 221.33 days) 
35. (a) 7.75 hours (Tech: 7.74 hours) 
(b) 5.43 hours and 6.77 hours 
37. 32.61 ounces 
39. (a) 18.88 pounds (Tech: 18.90 pounds) 
(b) 12.04 pounds (Tech: 12.05 pounds) 


41. Tires that wear out by 26,800 miles (Tech: 26,796 miles) 
will be replaced free of charge. 


Section 5.4 (page 274) 
1. 150, 3.536 3. 150, 1.581 


5. False. As the size of a sample increases, the mean of the 
distribution of sample means does not change. 

7. False. A sampling distribution is normal if either n = 30 
or the population is normal. 

9. (c), because wz = 16.5, oy = 1.19, and the graph approxi- 
mates a normal curve. 
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11. 25. 0.0003; Only 0.03% of samples of 35 specialists will have 
I M NT M I M ae 
Sample sail (Baa cd sa ad So a mean salary less than $60,000. This is an extremely 
25252 2 4,4,8 5:33 8, 16,2 8.67 unusual event. 
2,2,4 2.67 4,4, 16 8 8, 16,4 9.33 
2.2.8 4 4.8.2 467 8. 16,8 10.67 27. 0.9078 (Tech: 0.9083); About 91% of samples of 32 gas 
2,2, 16 6.67 4,8,4 5.33 8, 16, 16 13.33 stations that week will have a mean price between $2.695 
2, 4,2 2.67 | 4,8,8 6.67 | 16,2,2 6.67 and $2.725. 
2,4,4 3.33, | 4,8,16 | 9.33. | 16,2,4 7.33 29. ~ 0 (Tech: 0.0000002); There is almost no chance that a 
2,4,8 4.67 4, 16,2 7.33 16, 2,8 8.67 random sample of 60 women will have a mean height 
2,416 i 416.4 16,2,46 ee greater than 66 inches. This event is almost impossible. 
2,8,2 4 4,16,8 | 9.33 | 16,4,2 7.33 
2,8,4 4.67 4,16,16 | 12 16, 4,4 8 31. It is more likely to select a sample of 20 women with a 
2, 8,8 6 8,2,2 4 16, 4,8 9.33 mean height less than 70 inches because the sample of 
2, 8,16 8.67 8, 2,4 4.67 16, 4, 16 12 20 has a higher probability. 
Syshbye oF B28 ‘ at eee see 33. Yes, it is very unlikely that you would have randomly 
2, 16,4 7.33 8, 2, 16 8.67 16, 8,4 9.33 : 
sampled 40 cans with a mean equal to 127.9 ounces 
2, 16,8 8.67 8, 4,2 4.67 16, 8,8 10.67 b 2.3 dard dewiati f h 
2, 16, 16 11.33 | 8,4,4 5.33 16, 8, 16 13.33 ecause it is more than 2 standard deviations from the 
42,2 | 267 | 84,8 | 667 | 16,162 | 11.33 mean of the sample means. 
4,2,4 3.33 | 8,4, 16 9.33 | 16, 16,4 12 35. (a) 0.0008 (b) Claim is inaccurate. 
4,2,8 4.67 8, 8,2 6 16, 16,8 13:33 . 5 58 
(c) No, assuming the manufacturer’s claim is true, because 
4,2, 16 133 8, 8,4 6.67 16, 16, 16 16 06.95 is within 1 dard dewat, f th f 
4,4,2 3.33 8,8,8 8 70.2 Is within | standard deviation of the mean for an 
44,4 4 8, 8, 16 10.67 individual board. 


37. (a) 0.0002 (b) Claim is inaccurate. 


b=7.5, 0 © 5.36 } — 
(c) No, assuming the manufacturer’s claim is true, because 


ee ee 49,721 is within 1 standard deviation of the mean for 
The means are equal but the standard deviation of the an individual tire. 


sampling distribution is smaller. 


39. No, because the z-score (0.88) is not unusual. 
13. 0.9726; not unusual 15. 0.0351 (Tech: 0.0349); unusual 


41. Yes, the finite correction factor should be used; 0.0003 


17. 7.6,0.101 19. 235, 13.864 4B. 
Number of boys | Proportion of boys 
Sample from 3 births from 3 births 
bbb 3 1 
2 
2 
; ; bgb 2 5 
737475 76777879 207.3 -235°—262.7 : gbb 2 5 
Mean time (in hours) Mean price (in dollars) 1 
21. 188.4, 10.9 beg f 3 
1 
gbg 1 5 
1 
ggb 1 3 
ggg 0 0 
2 ae Numerical Sample 
166.6 1884 2102 . Sample | representation mean 
Mean consumption of fresh 
vegetables (in pounds) bbb 111 1 
2 
23. n = 24:7.6,0.07;n = 36: 7.6, 0.06 bbg ane 3 
. 1 = 36 beb 101 5 
n=24 Z vw 2 
Ss . gbb 011 5 
; 3 
4 
1 
/ bgg 100 : 
n= gbg 010 5 
\ geb 001 ; 
7374757677 7879 gee 000 0 


Mean time (in hours) 


eae The sample means are equal to the proportions. 
As the sample size increases, the standard error decreases, Pp _ PeeP 


while the mean of the sample means remains constant. 
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47. 0.0446 (Tech: 0.0441); About 4.5% (Tech: 4.4%) of 
samples of 105 female heart transplant patients will have 
a mean 3-year survival rate of less than 70%. Because the 
probability is less than 0.05, this is an unusual event. 


Section 5.4 Activity (page 280) 


1-2. Answers will vary. 


Section 5.5 (page 287) 


1. Properties of a binomial experiment: 


(1) The experiment is repeated for a fixed number of 
independent trials. 


(2) There are two possible outcomes: success or failure. 
(3) The probability of success is the same for each trial. 


(4) The random variable x counts the number of successful 
trials. 


3. Cannot use normal distribution. 
5. Cannot use normal distribution. 
7. Cannot use normal distribution because ng < 5. 
9. Can use normal distribution; w = 27.5, 0 ~ 3.52 
11. Cannot use normal distribution because ng < 5. 
13. a 14. d 15. c 16. b 
17. The probability of getting fewer than 25 successes; 
P(x < 24.5) 
19. The probability of getting exactly 33 successes; 
P(32.5 < x < 33.5) 
21. The probability of getting at most 150 successes; 
P(x < 150.5) 
23. Can use normal distribution. 


(a) 0.0782 (Tech: 0.0785) (b) 0.9147 (Tech: 0.9151) 


i i i i 1 i i 1 
tt 
84 88 92 96 100 
Number of adults 


Number of adults 


(c) 0.0853 (Tech: 0.0849) 


Number of adults 


(d) No, none of the probabilities are less than 0.05. 
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25. Can use normal distribution. 


(a) ~1 


a x 
30 40 50 60 70 
Number of people 


(c) 0.6097 (Tech: 0.6109) 


+--+ 9 x 
30 40 50 60 70 
Number of people 


(b) 0.9798 (Tech: 0.9801) 


a 
30 40 50 60 70 
Number of people 


(d) No, none of the probabilities are less than 0.05. 


27. Can use normal distribution. 


(a) 0.0692 (Tech: 0.0691) 


x=15.5 x= 165 


tT i tT i i 
t—}+—_} > '-_} F 
5.6 12.5 19.4 
Number of workers 


(c) 0.8078 (Tech: 0.8080) 


x=15.5 


tt 
5.6 12.5 19.4 
Number of workers 


0.0069 


| 
50 60 70 80 90 
Number of people 
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(b) 0.8770 (Tech: 0.8771) 


y | i 

t t t 
5.6 12.5 19.4 
Number of workers 


(d) 0.8212 (Tech: 0.8221) 


i | i 

t H—t—_t 
15.3 25 34.7 
Number of workers 


29, (a) Can use normal distribution. 
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(b) Can use normal distribution. 
0.3557 (Tech: 0.3545) 


50 60 70 80 90 
Number of people 


(c) Can use normal distribution. 
0.0558 (Tech: 0.0595) 


50 60 70 80 90 
Number of people 


(d) Cannot use normal distribution because np < 5 and 
nq < 5; 0.002 


31. Binomial: 0.549; Normal: 0.5463 (Tech: 0.5466); 
The results are about the same. 


33. Highly unlikely. Answers will vary. 35. 0.1020 


Uses and Abuses for Chapter 5 (page 291) 
1. (a) Not unusual; A sample mean of 115 is less than 2 
standard deviations from the population mean. 


(b) Not unusual; A sample mean of 105 lies within 2 
standard deviations of the population mean. 


2. The ages of students at a high school may not be normally 


distributed. 


3. Answers will vary. 


Review Answers for Chapter 5 (page 293) 
1 w=15,0 =3 


3. Curve B has the greatest mean because its line of 
symmetry occurs the farthest to the right. 


5. 2.25 ;0.5;2;3.5 7. 0.6772 9. 0.6293 
13. 0.00235 (Tech: 0.00236) 15. 0.4495 

17. 0.4365 (Tech: 0.4364) 19. 0.1336 
21. A = 8;B = 17;C = 23;D = 29 
25. 0.9236 (Tech: 0.9237) 27. 0.0124 
31. 0.2266 33. 0.2684 (Tech: 0.2685) 
35. (a) 0.3156 (b) 0.3099 (c) 0.3446 


37. No, none of the events are unusual because their 
probabilities are greater than 0.05. 


39. —0.07 41. 1.13 43. 1.04 
47. 42.5 meters 49, 51.6 meters 


23. 0.8997 
29. 0.8944 


45. 0.51 
51. 50.8 meters 


11. 0.7157 


53. 


55. 


57. 


59. 
61. 
63. 
65. 
69. 


Bh = 145,0 = 45 

bz = 145, og © 25.98 

The means are the same, but o; is less than o. 
76, 3.465 


all 


69 76 83 
Mean consumption (in pounds) 
(a) 0.0485 (Tech: 0.0482) 
(b) 0.8180 
(c) 0.0823 (Tech: 0.0829) 


(a) and (c) are smaller, (b) is larger. This is to be expected 
because the standard error of the sample means is smaller. 


(a) 0.1867 (Tech: 0.1855) (b) ~ 0 

0.0019 (Tech: 0.0018) 

Cannot use normal distribution because ng < 5. 
P(x > 24,5) 67. P(44.5 < x < 45.5) 

Can use normal distribution. 

= 0 (Tech: 0.0002) 


25.3 31.5 ST 
Children saying yes 


Chapter Quiz for Chapter 5 (page 297) 


1. 


m7) 


10. 


11. 


. (a) 0.9198 (Tech: 0.9199) 


. 21.19% 
~ 125 8. 80 
. 0.0049; About 0.5% of samples of 60 students will have 


(a) 0.9945 
(c) 0.6212 


(b) 0.9990 

(d) 0.83685 (Tech: 0.83692) 

(b) 0.1940 (Tech: 0.1938) 
(c) 0.0456 (Tech: 0.0455) 


. 0.0475 (Tech: 0.0478); Yes, the event is unusual because its 


probability is less than 0.05. 


. 0.2586 (Tech: 0.2611); No, the event is not unusual because 


its probability is greater than 0.05. 
6. 503 students (Tech: 505 students) 


a mean IQ score greater than 105. This is a very unusual 
event. 


More likely to select one student with an IQ score greater 
than 105 because the standard error of the mean is less 
than the standard deviation. 


Can use normal distribution. 
bw = 28.35,0 © 2.32 
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12. 0.0004; This event is extremely unusual because its 
probability is much less than 0.05. 


Real Statistics-Real Decisions for Chapter 5 (page 298) 


1. (a) 0.0014 =(b) 0.9495 (c) ©0 
(d) There is a very high probability that at least 40 out of 
60 employees will participate, and the probability that 
fewer than 20 will participate is almost 0. 


2. (a) 0.2514 (Tech: 0.2525)  (b) 0.4972 (Tech: 0.4950) 
(c) 0.2514 (Tech: 0.2525) 
3. (a) 3; The line of symmetry occurs at x = 3. 


(b) Yes (c) Answers will vary. 


Cumulative Review Answers for Chapters 3-5 (page 300) 


1. (a) np = 75 = 5,nq = 425 =5 
(b) 0.9973 
(c) Yes, because the probability is less than 0.05. 
2. (a) 3.1 (b) 16 = (c) 13 = © (d) 3.1 
(e) The size of a family household on average is about 
3 persons. The standard deviation is 1.3, so most 


households differ from the mean by no more than 
about 1 person. 
3. (a) 3.6 (b) 19 (c) 14 (d) 3.6 
(e) The number of fouls for a player in a game on average 
is about 4 fouls. The standard deviation is 1.4, so most 


of the player’s games differ from the mean by no more 
than about 1 or 2 fouls. 


4. (a) 0.476 (b) 0.78 (c) 0.659 
5. (a) 43,680  (b) 0.0192 

6. 0.7642 7. 0.0010 8. 0.7995 
9. 0.4984 10. 0.2862 ~—- 11. 0.5905 


12. (a) 0.0462; unusual, because the probability is less 
than 0.05 


(b) 0.6029 


(c) 0.0139; unusual, because the probability is less 
than 0.05 


13. (a) 0.0048 (b) 0.0149 
14. (a) 0.2777 (b) 0.8657 


(c) Dependent. P(being a public school teacher | having 
20 years or more of full-time teaching 
experience) # P(being a public school teacher) 


(d) 0.8881 (e) 0.4177 
15. (a) 70, 0.1897 


(c) 0.9511 


(b) 0.0006 


=I 


69.2 70 70.8 
Initial pressure (in psi) 
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16. (a) 0.0548  (b) 0.6547. = (c) 52.2 months 
17. (a) 495 (b) 0.0020 
18. (a) 0.0278; unusual, because the probability is less 
than 0.05 
(b) 0.2272 (c) 0.5982 


CHAPTER 6 


Section 6.1 (page 311) 


1. You are more likely to be correct using an interval 
estimate because it is unlikely that a point estimate will 
exactly equal the population mean. 


3. d; As the level of confidence increases, z, increases, causing 
wider intervals. 


5. 1.28 7; 1.15 9. —0.47 11. 1.76 
15. 0.192 17. c 18. d 19. b 20. a 
21. (12.0, 12.6) 23. (9.7, 11.3) 25. 1.4, 13.4 
27. 0.17, 1.88 29. 126 31.7 33. 1.95, 28.15 
35. (428.68, 476.92); (424.06, 481.54) 
With 90% confidence, you can say that the population 
mean price is between $428.68 and $476.92; with 95% 
confidence, you can say that the population mean price is 
between $424.06 and $481.54. The 95% CI is wider. 

37. (87.0, 111.6); (84.7, 113.9) 
With 90% confidence, you can say that the population 
mean is between 87.0 and 111.6 calories; with 95% 
confidence, you can say that the population mean is 
between 84.7 and 113.9 calories. The 95% CI is wider. 

39. (2532.20, 2767.80) 
With 95% confidence, you can say that the population 
mean cost is between $2532.20 and $2767.80. 

41. (2556.87, 2743.13) [Tech: (2556.90, 2743.10)] 
The n = 50 CI is wider because a smaller sample is taken, 
giving less information about the population. 

43. (3.09, 3.15) 


With 95% confidence, you can say that the population 
mean time is between 3.09 and 3.15 minutes. 
45. (3.10, 3.14) 


The s = 0.09 CI is wider because of the increased 
variability within the sample. 


13. 1.861 


47. (a) An increase in the level of confidence will widen the 
confidence interval. 


(b) An increase in the sample size will narrow the 
confidence interval. 


(c) An increase in the standard deviation will widen the 
confidence interval. 


49, (14.6, 15.6); (14.4, 15.8) 


With 90% confidence, you can say that the population 
mean length of time is between 14.6 and 15.6 minutes; 
with 99% confidence, you can say that the population 
mean length of time is between 14.4 and 15.8 minutes. 
The 99% CI is wider. 
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51. 
53. 


55. 


57. 


59. 


61. 


63. 


65. 
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(a) 121 servings (b) 208 servings 

(c) The 99% CI requires a larger sample because more 
information is needed from the population to be 99% 
confident. 

(a) 32cans_ (b) 87 cans 

E = 0.15 requires a larger sample size. As the error size 

decreases, a larger sample must be taken to obtain enough 

information from the population to ensure the desired 

accuracy. 

(a) 16sheets (b) 62 sheets 

E = 0.0625 requires a larger sample size. As the error size 

decreases, a larger sample must be taken to obtain enough 

information from the population to ensure the desired 

accuracy. 

(a) 42 soccer balls (b) 60 soccer balls 

o = 0.3 requires a larger sample size. Due to the increased 

variability in the population, a larger sample size is needed 

to ensure the desired accuracy. 

(a) An increase in the level of confidence will increase the 
minimum sample size required. 

(b) An increase (larger £) in the error tolerance will 
decrease the minimum sample size required. 

(c) An increase in the population standard deviation will 
increase the minimum sample size required. 


(212.8, 221.4) 


With 95% confidence, you can say that the population 
mean airfare price is between $212.8 and $221.4. 


80% confidence interval results: 


: population mean 
Standard deviation = 344.9 


Sample 
Mean n Mean Std. Err. L. Limit U. Limit 
M 30 1042.7 62.969837 | 962.0009 1123.399 


90% confidence interval results: 
/: population mean 
Standard deviation = 344.9 


Sample 
Mean n Mean Std. Err. L. Limit U. Limit 
Mb 30 1042.7 62.969837 | 939.12384 | 1146.2761 


95% confidence interval results: 
b: population mean 
Standard deviation = 344.9 


Sample 
Mean n Mean Std. Err. L. Limit U. Limit 
Mb 30 1042.7 62.969837 | 919.2814 | 1166.1187 


67. 


69. 


With 80% confidence, you can say that the population 
mean sodium content is between 962.0 and 1123.4 
milligrams; with 90% confidence, you can say it is between 
939.1 and 1146.3 milligrams; with 95% confidence, you can 
say it is between 919.3 and 1166.1 milligrams. 


(a) 0.707 (b) 0.949 (c) 0.962 (d) 0.975 


(e) The finite population correction factor approaches 
1 as the sample size decreases and the population size 
remains the same. 


Sample answer: 


E= we Write original equation 
EVn = z.0 Multiply each side by Vn. 
Vin = = Divide each side by E. 


a0" ; 
n= (47) Square each side. 


Section 6.2 (page 323) 


1. 
9. 
11. 


13. 
15. 
17. 


19. 


21. 


23. 
25. 
27. 


29. 


31. 


1.833 3. 2.947 5. 2.7 7.1.2 

(a) (10.9,14.1)  (b) The ¢-Cl is wider. 

(a) (4.1, 4.5) 

(b) When rounded to the nearest tenth, the normal CI and 
the -CI have the same width. 

3.7, 18.4 

9.5, 74.1 

6.0; (29.5, 41.5); With 95% confidence, you can say that the 


population mean commute time is between 29.5 and 41.5 
minutes. 


6.4; (29.1, 41.9); With 95% confidence, you can say that the 
population mean commute time is between 29.1 and 41.9 
minutes. This confidence interval is slightly wider than the 
one found in Exercise 17. 


(a) (3.80, 5.20) 

(b) (4.41, 4.59); The t-CI in part (a) is wider. 

(a) 90,182.9  (b) 3724.9 (c) (87,438.6, 92,927.2) 
(a) 1767.7 (b) 252.2 (c) (1541.5, 1993.8) 

Use a normal distribution because n = 30. 


(26.0, 29.4); With 95% confidence, you can say that the 
population mean BMI is between 26.0 and 29.4. 

Use a t-distribution because n < 30, the miles per gallon 
are normally distributed, and o is unknown. 


(20.5, 23.3) [Tech: (20.5, 23.4)]; With 95% confidence, you 
can say that the population mean is between 20.5 and 
23.3 miles per gallon. 


Cannot use normal or f-distribution because n < 30 and 
the times are not normally distributed. 
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90% confidence interval results: 


pw: mean of Variable 


Variable Sample Mean Std. Err. 
Time (in hours) 12.194445 0.4136141 
DF L. Limit U. Limit 

17 11.474918 | 12.91397 
95% confidence interval results: 
p: mean of Variable 

Variable Sample Mean Std. Err. 
Time (in hours) 12.194445 0.4136141 
DF L. Limit U. Limit 

17 11.321795 | 13.067094 
99% confidence interval results: 
pw: mean of Variable 

Variable Sample Mean Std. Err. 
Time (in hours) 12.194445 0.4136141 
DF L. Limit U. Limit 

17 10.995695 | 13.393193 


With 90% confidence, you can say that the population 
mean time spent on homework is between 11.5 and 12.9 
hours; with 95% confidence, you can say it is between 11.3 
and 13.1 hours; and with 99% confidence, you can say it is 
between 11.0 and 13.4 hours. As the level of confidence 
increases, the intervals get wider. 


No; They are not making good tennis balls because the 
desired bounce height of 55.5 inches is not between 55.9 
and 56.1 inches. 


Activity 6.2 (page 326) 


1-2. Answers will vary. 


Section 6.3 (page 332) 


1. False. To estimate the value of p, the population proportion 


of successes, use the point estimate p = x/n. 


3. 0.750,0.250 5. 0.423, 0.577 
7. E =0.014,p = 0.919 9 E = 0.042, p = 0.554 
11. (0.557, 0.619) [Tech: (0.556, 0.619)]; 


(0.551, 0.625) [Tech: (0.550, 0.625)]; 


With 90% confidence, you can say that the population 
proportion of U.S. males ages 18-64 who say they have 
gone to the dentist in the past year is between 55.7% 
(Tech: 55.6%) and 61.9%; with 95% confidence, you can 
say it is between 55.1% (Tech: 55.0%) and 62.5%. The 
95% confidence interval is slightly wider. 
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13. 


15. 
17. 


19. 


21. 


23. 
25. 


27. 


(0.438, 0.484); 


With 99% confidence, you can say that the population 
proportion of U.S. adults who say they have started paying 
bills online in the last year is between 43.8% and 48.4%. 


(0.622, 0.644) 
(a) 601 adults (b) 413 adults 


(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 


(a) 752 adults (b) 483 adults 


(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 


(a) (0.234, 0.306) 
(b) (0.450, 0.530) 
(c) (0.275, 0.345) 
(a) (0.274, 0.366) (b) (0.511, 0.609) 


No, it is unlikely that the two proportions are equal 
because the confidence intervals estimating the proportions 
do not overlap. The 99% confidence intervals are 

(0.260, 0.380) and (0.496, 0.624). Although these intervals 
are wider, they still do not overlap. 

90% confidence interval results: 

Pp : proportion of successes for population 


Method: Standard-Wald 


Proportion | Count | Total | Sample Prop. 

p 802 1025 0.78243905 
Std. Err. L. Limit U. Limit 

0.012887059 | 0.7612417 | 0.8036364 


95% confidence interval results: 
Pp : proportion of successes for population 
Method: Standard-Wald 


Proportion | Count | Total | Sample Prop. 

p 802 1025 0.78243905 
Std. Err. L. Limit U. Limit 

0.012887059 | 0.75718087 | 0.8076972 


99% confidence interval results: 
Pp : proportion of successes for population 
Method: Standard-Wald 


Proportion | Count | Total | Sample Prop. 

p 802 1025 0.78243905 
Std. Err. L. Limit U. Limit 

0.012887059 | 0.74924415 | 0.8156339 
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31. 


33. 
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With 90% confidence, you can say that the population 
proportion of U.S. adults who disapprove of the job 
Congress is doing is between 76.1% and 80.4%; with 95% 
confidence, you can say it is between 75.7% and 80.8%; 
and with 99% confidence, you can say it is between 74.9% 
and 81.6%. As the level of confidence increases, the 
intervals get wider. 


(0.304, 0.324) is approximately a 97.6% CL. 

If np < 5 or ng < 5, the sampling distribution of p may 
not be normally distributed, so z. cannot be used to 
calculate the confidence interval. 


p | a=1-p | pa | | » | a=1-6 | pa 
0.0 1.0 0.00 0.45 0.55 0.2475 
0.1 0.9 0.09 0.46 0.54 0.2484 
0.2 0.8 0.16 0.47 0.53 0.2491 
0.3 0.7 0.21 0.48 0.52 0.2496 
0.4 0.6 0.24 0.49 0.51 0.2499 
0.5 0.5 0.25 0.50 0.50 0.2500 
0.6 0.4 0.24 0.51 0.49 0.2499 
0.7 0.3 0.21 0.52 0.48 0.2496 
0.8 0.2 0.16 0.53 0.47 0.2491 
0.9 0.1 0.09 0.54 0.46 0.2484 
1.0 0.0 0.00 0.55 0.45 0.2475 


p = 0.5 gives the maximum value of pq. 


Activity 6.3 (page 336) 


1-2. Answers will vary. 


Section 6.4 (page 341) 


1. 
3. 
9. 


11. 


13. 


15. 


17. 


‘Yes 
14.067, 2.167 5. 32.852, 8.907 7. 52.336, 13.121 
(a) (0.0000413, 0.000157) — (b) (0.00643, 0.0125) 


With 90% confidence, you can say that the population 
variance is between 0.0000413 and 0.000157, and the 
population standard deviation is between 0.00643 and 
0.0125 milligram. 


(a) (0.0305,0.191) (b) (0.175, 0.438) 
With 99% confidence, you can say that the population 


variance is between 0.0305 and 0.191, and the population 
standard deviation is between 0.175 and 0.438 hour. 

(a) (6.63, 55.46) — (b) (2.58, 7.45) 

With 99% confidence, you can say that the population 
variance is between 6.63 and 55.46, and the population 
standard deviation is between 2.58 and 7.45 dollars 

per year. 

(a) (380.0, 3942.6)  (b) (19.5, 62.8) 

With 98% confidence, you can say that the population 
variance is between 380.0 and 3942.6, and the population 
standard deviation is between $19.5 and $62.8. 

(a) (22.5,98.7)  (b) (4.7, 9.9) 


With 95% confidence, you can say that the population 
variance is between 22.5 and 98.7, and the population 
standard deviation is between 4.7 and 9.9 beats per minute. 


19. 


21. 


23. 


25. 


27. 


29. 


31. 


(a) (128,492) (b) (11, 22) 

With 95% confidence, you can say that the population 
variance is between 128 and 492, and the population 
standard deviation is between 11 and 22 grains per gallon. 
(a) (9,104,741, 25,615,326) (b) (3017, 5061) 

With 80% confidence, you can say that the population 
variance is between 9,104,741 and 25,615,326, and the 
population standard deviation is between $3017 and $5061. 
(a) (7.0,30.6) (b) (2.6, 5.5) 

With 98% confidence, you can say that the population 
variance is between 7.0 and 30.6, and the population 
standard deviation is between 2.6 and 5.5 minutes. 

95% confidence interval results: 


og” : variance of Variable 


Variance | Sample Var. | DF | L. Limit U. Limit 


7.332092 | 20.891039 


oe 11.56 29 


(2.71, 4.57) 
90% confidence interval results: 


a” : variance of Variable 


Variance | Sample Var. | DF | L. Limit U. Limit 
o 1225 17 754.8815 | 2401.4731 
(27, 49) 


Yes, because all of the values in the confidence interval are 
less than 0.015. 


Answers will vary. Sample answer: Unlike a confidence 
interval for a population mean or proportion, a confidence 
interval for a population variance does not have a margin 
of error. The left and right endpoints must be calculated 
separately. 


Uses and Abuses for Chapter 6 (page 344) 


1-2. Answers will vary. 


Review Answers for Chapter 6 (page 346) 


37. 


. (a) 103.5 (b) 9.0 3. (15.6, 16.0) 5. 1.675, 22.425 
. 47 people 9. 49 people 11. 1.383 13. 2.624 

. n= 20 17. 11.2 19. 0.7 21. (60.9, 83.3) 

» (6.1, 7.5) 25. (2050, 2386) 27. 0.81, 0.19 

. 0.540, 0.460 31. 0.140, 0.860 33. 0.490, 0.510 

~ (0.790, 0.830) 


With 95% confidence, you can say that the population 
proportion of U.S. adults who say they will participate in 
the 2010 Census is between 79.0% and 83.0%. 


(0.514, 0.566) [Tech: (0.514, 0.565)] 


With 90% confidence, you can say that the population 
proportion of U.S. adults who say they have worked the 
night shift at some point in their lives is between 51.4% 
and 56.6% (Tech: 56.5%). 


A79 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


39. (0.112, 0.168) 


With 99% confidence, you can say that the population 
proportion of U.S. adults who say that the cost of 
healthcare is the most important financial problem facing 
their family today is between 11.2% and 16.8%. 

(0.466, 0.514) 


With 80% confidence, you can say that the population 

proportion of parents with kids 4 to 8 years old who say 

they know their state booster seat law is between 46.6% 

and 51.4%. 

(a) 385 adults (b) 359 adults 

(c) Having an estimate of the population proportion 
reduces the minimum sample size needed. 

45. 23.337, 4.404 47. 14.067, 2.167 


49, (27.2, 113.5); (5.2, 10.7) 54. (0.80, 3.07); (0.89, 1.75) 


41 


43. 


Chapter Quiz for Chapter 6 (page 349) 


1. (a) 6.85 
(b) 0.65; You are 95% confident that the margin of error 
for the population mean is about 0.65 minute. 
(c) (6.20, 7.50) 


With 95% confidence, you can say that the population 
mean amount of time is between 6.20 and 7.50 minutes. 


2. 39 college students 
3. (a) 33.11; 2.38 
(b) (31.73, 34.49) 


With 90% confidence, you can say that the population 
mean time played in the season is between 31.73 and 
34.49 minutes. 

(c) (30.38, 35.84) 


With 90% confidence, you can say that the population 
mean time played in the season is between 30.38 and 
35.84 minutes. This confidence interval is wider than 
the one found in part (b). 


4. (6510, 7138) 
5. (a) 0.780  (b) (0.762, 0.798) [Tech: (0.762, 0.799)] 
(c) 712 adults 


6. (a) (2.10,5.99)  (b) (1.45, 2.45) 


Real Statistics—Real Decisions for Chapter 6 (page 350) 


1. (a) Yes, there has been a change in the mean concentration 
level because the confidence interval for Year 1 does 
not overlap the confidence interval for Year 2. 


(b) No, there has not been a change in the mean 
concentration level because the confidence interval 
for Year 2 overlaps the confidence interval for Year 3. 


(c) Yes, there has been a change in the mean concentration 
level because the confidence interval for Year 1 does 
not overlap the confidence interval for Year 3. 


2. The concentrations of cyanide in the drinking water have 
increased over the three-year period. 


A80 


3. The width of the confidence interval for Year 2 may have 
been caused by greater variation in the levels of cyanide 
than in the other years, which may be the result of outliers. 


4. (a) The sampling distribution of the sample means was 
used because the “mean concentration” was used. The 
sample mean is the most unbiased point estimate of the 
population mean. 


(b) No, because typically o is unknown. They could have 
used the sample standard deviation. 


CHAPTER 7 


Section 7.1 (page 367) 
1. The two types of hypotheses used in a hypothesis test are 
the null hypothesis and the alternative hypothesis. 


The alternative hypothesis is the complement of the null 
hypothesis. 


3. You can reject the null hypothesis, or you can fail to reject 
the null hypothesis. 


5. False. In a hypothesis test, you assume the null hypothesis 
is true. 


7. True 


9. False. A small P-value in a test will favor rejection of the 
null hypothesis. 


11. Ap: w = 645 (claim); H,: w > 645 
13. Hy:o0 = 5; H,:0 #5 (claim) 
15. Ho: p = 0.45; H,: p < 0.45 (claim) 


17. c; Hy: w = 3 18. d; Ho: w = 3 
<r ee 
1 2 3 4 1 2 3 4 


19. b; Hp: w = 3 20. a; Hp: w = 2 


<—_—o + 
1 2 3 4 1 2 3 4 


21. Right-tailed 
25. yo > 750 
Ho: w = 750; Hy: w > 750 (claim) 
27. o = 320 
Hp: o = 320 (claim); H,: 0 > 320 
29, w < 45 
Ao: w = 45; Hy: w < 45 (claim) 
31. A type I error will occur if the actual proportion of new 


customers who return to buy their next piece of furniture is 
at least 0.60, but you reject Hp: p = 0.60. 


A type II error will occur if the actual proportion of new 
customers who return to buy their next piece of furniture is 
less than 0.60, but you fail to reject Hp: p = 0.60. 

33. A type I error will occur if the actual standard deviation of 
the length of time to play a game is less than or equal to 
12 minutes, but you reject Hp: 07 = 12. 


23. Two-tailed 


A type II error will occur if the actual standard deviation 
of the length of time to play a game is greater than 
12 minutes, but you fail to reject Hp: 0 = 12. 
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35. 


37. 


39. 


41. 


43. 


45. 


47. 


49. 
51. 


53. 
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A type I error will occur if the actual proportion of 
applicants who become police officers is at most 0.20, but 
you reject Hy: p = 0.20. 

A type II error will occur if the actual proportion of 
applicants who become police officers is greater than 0.20, 
but you fail to reject Hp: p = 0.20. 


Hy: The proportion of homeowners who have a home 
security alarm is greater than or equal to 14%. 


H,: The proportion of homeowners who have a home 
security alarm is less than 14%. 


Ah: p = 0.14; H,: p < 0.14 
Left-tailed because the alternative hypothesis contains <. 


Hp: The standard deviation of the 18-hole scores for a 
golfer is greater than or equal to 2.1 strokes. 


H,: The standard deviation of the 18-hole scores for a 
golfer is less than 2.1 strokes. 


Ayia = 2.1; Ho < 2.1 
Left-tailed because the alternative hypothesis contains <. 


Hy: The mean length of the baseball team’s games is 
greater than or equal to 2.5 hours. 


H,: The mean length of the baseball team’s games is less 
than 2.5 hours. 


Ab: w = 2.5; Hy pw < 2.5 
Left-tailed because the alternative hypothesis contains <. 


(a) There is enough evidence to support the scientist’s 
claim that the mean incubation period for swan eggs is 
less than 40 days. 


(b) There is not enough evidence to support the scientist’s 
claim that the mean incubation period for swan eggs is 
less than 40 days. 


(a) There is enough evidence to support the U.S. 
Department of Labor’s claim that the proportion 
of full-time workers earning over $450 per week is 
greater than 75%. 


(b) There is not enough evidence to support the U.S. 
Department of Labor’s claim that the proportion 
of full-time workers earning over $450 per week is 
greater than 75%. 


(a) There is enough evidence to support the researcher’s 
claim that the proportion of people who have had no 
health care visits in the past year is less than 17%. 

(b) There is not enough evidence to support the 
researcher’s claim that the proportion of people who 
have had no health care visits in the past year is less 
than 17%. 

Ab: w = 60; H,: w < 60 

(a) Ho: w = 15; Hy: < 15 

(b) Ho: w = 15; Hy: > 15 

If you decrease a, you are decreasing the probability 

that you will reject Hp. Therefore, you are increasing the 

probability of failing to reject Hp. This could increase B, 

the probability of failing to reject Hy when Hp is false. 


55. Yes; If the P-value is less than a = 0.05, it is also less than 
a = 0.10. 


57. (a) Fail to reject Hy because the confidence interval 
includes values greater than 70. 


(b) Reject Hy because the confidence interval is located 
entirely to the left of 70. 


(c) Fail to reject Hy because the confidence interval 
includes values greater than 70. 


59, (a) Reject Hy because the confidence interval is located 
entirely to the right of 0.20. 


(b) Fail to reject Hy because the confidence interval 
includes values less than 0.20. 


(c) Fail to reject Hy because the confidence interval 
includes values less than 0.20. 


Section 7.2 (page 381) 


1. In the z-test using rejection region(s), the test statistic is 
compared with critical values. The z-test using a P-value 
compares the P-value with the level of significance a. 


3. P = 0.0934; Reject Hp. 5. P = 0.0069; Reject Hp. 
7. P = 0.0930; Fail to reject Hp. 
9. b 10. d IL. c 12. a 
13. (a) Fail to reject Ap. 
(b) Reject Hp. 
15. Fail to reject Hp. 


17. 1.645 

3-2-1 0 1/2 3 

Zy = 1.645 

19. —1.88 

-rceee 
21. —2.33, 2.33 

-3/2-1 0 1 2\3 . 

—z, =—-2.33 Z.= 2:33 


23. (a) Fail to reject Hy because z < 1.285. 
(b) Fail to reject Hp because z < 1.285. 
(c) Fail to reject Hp because z < 1.285. 
(d) Reject Hj because z > 1.285. 


25. Reject Hy. There is enough evidence at the 5% level of 
significance to reject the claim. 


27. Reject Hp. There is enough evidence at the 2% level of 
significance to support the claim. 
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29. (a) Ho: w = 30; H,: w > 30 (claim) 
(c) 0.0023 (d) Reject Hp. 
(e) There is enough evidence at the 1% level of 
significance to support the student’s claim that the 


mean raw score for the school’s applicants is more 
than 30. 


31. (a) Ho: w = 28.5 (claim); H,: w # 28.5 

(b) —1.71; 0.0436  (c) 0.0872 (Tech: 0.0878) 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 8% level of 
significance to reject the U.S. Department of 
Agriculture’s claim that the mean consumption of 
bottled water by a person in the United States is 
28.5 gallons per year. 


33. (a) Ho: w = 15 (claim); H,: w # 15 
(b) —0.22; 0.4129 (Tech: 0.4135) 
(c) 0.8258 (Tech: 0.8270) (d) Fail to reject Hp. 


(e) There is not enough evidence at the 5% level of 
significance to reject the claim that the mean time it 


(b) 2.83; 0.9977 


takes smokers to quit smoking permanently is 15 years. 


35. (a) Ho: w = 40 (claim); H,: w # 40 
(b) —Zy = —2.575, Z = 2.575; 
Rejection regions: z < —2.575, z > 2.575 
(c) —0.584  (d) Fail to reject Hp. 
(e) There is not enough evidence at the 1% level of 
significance to reject the company’s claim that the 


mean caffeine content per 12-ounce bottle of cola is 
40 milligrams. 


37. (a) Ho: w = 750 (claim); H,: w < 750 
(b) z = —2.05; Rejection region: z < —2.05 
(c) —0.5  (d) Fail to reject Hp. 
(e) There is not enough evidence at the 2% level of 
significance to reject the light bulb manufacturer’s 


claim that the mean life of the bulb is at least 
750 hours. 


39. (a) Ho: w = 32; H,: w > 32 (claim) 
(b) z = 1.555; Rejection region: z > 1.555 
(c) —1.478  (d) Fail to reject Hp. 


(e) There is not enough evidence at the 6% level of 
significance to support the scientist’s claim that the 


mean nitrogen dioxide level in Calgary is greater than 


32 parts per billion. 
41. (a) Ho: w = 10 (claim); H,: uw < 10 
(b) zo = —1.88; Rejection region: z < —1.88 
(c) —0.51  (d) Fail to reject Ah. 


(e) There is not enough evidence at the 3% level of 


significance to reject the weight loss program’s claim 
that the mean weight loss after one month is at least 


10 pounds. 
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43. 


45. 


47. 


49. 


Hypothesis test results: 


: population mean 


Ho: = 58 
Ha: pe # 58 
Standard deviation = 2.35 
Sample 
Mean | n Mean Std. Err. Z-Stat P-value 
Kb 80 57.6 | 0.262738 | —1.5224292 0.1279 


P = 0.1279 > 0.10, so fail to reject Hp. There is not 
enough evidence at the 10% level of significance to reject 
the claim. 


Hypothesis test results: 


/: population mean 


Ho: = 1210 
Ha: p > 1210 
Standard deviation = 205.87 
Sample 
Mean | n Mean Std. Err. Z-Stat P-value 
KM 250 | 1234.21 | 13.020362 | 1.8593953 0.0315 


P = 0.0315 < 0.08, so reject Hp. There is enough evidence 
at the 8% level of significance to reject the claim. 

Fail to reject Hy because the standardized test statistic 

z = —1.86 is not in the rejection region (z < —2.33). 

b, d; If a = 0.05, the rejection region is z < —1.645; 
because z = —1.86 is in the rejection region, you can reject 
Hp. If n = 50, the standardized test statistic is z = —2.40; 
because z = —2.40 is in the rejection region (z < —2.33), 
you can reject Ho. 


Section 7.3 (page 393) 


1. 


» 1.717 5. —1.328 


Identify the level of significance a and the degrees of 
freedom, d.f. = n — 1. Find the critical value(s) using 
the ¢-distribution table in the row with n — 1 df. If the 
hypothesis test is 


(1) left-tailed, use the “One Tail, a” column with a 
negative sign. 

(2) right-tailed, use the “One Tail, a” column with a 
positive sign. 

(3) two-tailed, use the “Two Tails, a” column with a 
negative and a positive sign. 


7. —2.056, 2.056 


9. (a) Fail to reject Hp because t > —2.086. 


11. 


(b) Fail to reject Hp because t > —2.086. 

(c) Fail to reject Hy) because t > —2.086. 

(d) Reject Hy because t < —2.086. 

(a) Fail to reject Hp because —2.602 < t < 2.602. 
(b) Fail to reject Hp because —2.602 < t < 2.602. 
(c) Reject Hp because t > 2.602. 

(d) Reject Hp because t < —2.602. 
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13. Fail to reject Hp. There is not enough evidence at the 1% 
level of significance to reject the claim. 
15. Reject Hp. There is enough evidence at the 1% level of 
significance to reject the claim. 
17. (a) Ho: w = 18,000 (claim); H,: ~ # 18,000 
(b) tly = —2.145, to = 2.145; 
Rejection regions: t < —2.145, tf > 2.145 

(c) 1.21  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the dealer’s claim that the mean 
price of a 2008 Subaru Forester is $18,000. 

19. (a) Ho: w = 60; Hy: w > 60 (claim) 

(b) to = 1.943; Rejection region: t > 1.943 

(c) 2.12 (d) Reject Ap. 

(ec) There is enough evidence at the 5% level of 
significance to support the board’s claim that the mean 
number of hours worked per week by surgical faculty 
who teach at an academic institution is more than 
60 hours. 

21. (a) Hy: w = 1; H,: w > 1 (claim) 

(b) fo = 1.356; Rejection region: t > 1.356 

(c) 6.44 (d) Reject Ap. 

(e) There is enough evidence at the 10% level of 
significance to support the environmentalist’s claim 
that the mean amount of waste recycled by adults in 
the United States is more than 1 pound per person 
per day. 

23. (a) Ho: w = $26,000 (claim); H,: » # $26,000 

(b) -t) = —2.262, ty = 2.262; 

Rejection regions: ¢ < —2.262, t > 2.262 

(c) —0.15  (d) Fail to reject Hp. 

(ce) There is not enough evidence at the 5% level of 
significance to reject the employment information 
service’s claim that the mean salary for full-time male 
workers over age 25 without a high school diploma 
is $26,000. 

25. (a) Ho: w = 45; H,: w > 45 (claim) 

(b) 0.0052 (c) Reject Hp. 

(d) There is enough evidence at the 10% level of 
significance to support the county’s claim that the 
mean speed of the vehicles is greater than 45 miles 
per hour. 

27. (a) Ho: w = $105 (claim); H,: w # $105 

(b) 0.0165 = (c) Fail to reject Hp. 

(d) There is not enough evidence at the 1% level of 
significance to reject the travel association’s claim that 
the mean daily meal cost for two adults traveling 
together on vacation in San Francisco is $105. 

29. (a) Ho: w = 32; Hy: w < 32 (claim) 

(b) 0.0344 (c) Reject Hp. 

(d) There is enough evidence at the 5% level of 
significance to support the brochure’s claim that 


the mean class size for full-time faculty is fewer than 
32 students. 


31. Hypothesis test results: 


/: population mean 


Ho 1h 75 
Sample 


Mean | Mean Std. Err. DF T-Stat P-value 


Mb 73.6 | 0.62757164 25 | —2.2308211 0.9825 


P = 0.9825 > 0.05, so fail to reject Hp. There is not enough 
evidence at the 5% level of significance to reject the claim. 


33. Hypothesis test results: 


/: population mean 
Ho: = 188 
Ha: py < 188 


Sample 


Mean Mean Std. Err. | DF | T-Stat | P-value 


Mb 186 4 8 =0'5 0.3153 


P = 0.3153 > 0.05, so fail to reject Hp. There is not enough 
evidence at the 5% level of significance to support the 
claim. 

35. Because the P-value = 0.0748 > 0.05, fail to reject Hp. 


37. Use the t-distribution because the population is normal, 


n < 30, and o is unknown. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to reject the car company’s claim 
that the mean gas mileage for the luxury sedan is at least 
23 miles per gallon. 


39. 


Morte likely; For degrees of freedom less than 30, the tails 
of a t-distribution curve are thicker than those of a 
standard normal distribution curve. So, if you incorrectly 
use a standard normal sampling distribution instead of a 
t-sampling distribution, the area under the curve at the tails 
will be smaller than what it would be for the t-test, meaning 
the critical value(s) will lie closer to the mean. This makes 
it more likely for the test statistic to be in the rejection 
region(s). This result is the same regardless of whether the 
test is left-tailed, right-tailed, or two-tailed; in each case, the 
tail thickness affects the location of the critical value(s). 


Section 7.3 Activity (page 397) 


1-3. Answers will vary. 


Section 7.4 (page 401) 
1. If np = 5 and ng = 5, the normal distribution can be 
used. 
3. Cannot use normal distribution. 
5. Can use normal distribution. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to support the claim. 


7. Can use normal distribution. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to reject the claim. 
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9. 


11. 


13. 


15. 


17. 


19. 
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(a) Ap: p = 0.25; H,: p < 0.25 (claim) 

(b) z = —1.645; Rejection region: z < —1.645 
(c) —2.12 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 


significance to support the researcher’s claim that less 
than 25% of U.S. adults are smokers. 


(a) Ap: p = 0.50 (claim); H,: p > 0.50 

(b) z = 2.33; Rejection region: z > 2.33 

(c) 1.96  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the research center’s claim that 
at most 50% of people believe that drivers should be 
allowed to use cellular phones with hands-free devices 
while driving. 

(a) Ap: p = 0.75; H,: p > 0.75 (claim) 

(b) z = 1.28; Rejection region: z > 1.28 

(c) 1.98  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of 
significance to support the research center’s claim that 


more than 75% of females ages 20-29 are taller than 
62 inches. 


(a) Ap: p = 0.35; H,: p < 0.35 (claim) 

(b) z = —1.28; Rejection region: z < —1.28 

(c) 1.68  (d) Fail to reject A. 

(e) There is not enough evidence at the 10% level of 


significance to support the humane society’s claim that 
less than 35% of U.S. households own a dog. 


Fail to reject Hp. There is not enough evidence at the 5% 

level of significance to reject the claim that at least 52% of 

adults are more likely to buy a product when there are free 

samples. 

(a) Ap: p = 0.35; H,: p < 0.35 (claim) 

(b) Zz = —1.28; Rejection region: z < —1.28 

(c) 1.68  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to support the humane society’s claim that 
less than 35% of U.S. households own a dog. 


The results are the same. 


Section 7.4 Activity (page 403) 


1-2. Answers will vary. 


Section 7.5 (page 410) 


1. 


Specify the level of significance a. Determine the degrees 
of freedom. Determine the critical values using the 

x?- distribution. For a right-tailed test, use the value that 
corresponds to d.f. and a; for a left-tailed test, use the 
value that corresponds to d.f. and 1 — a; for a two-tailed 


test, use the values that correspond to d.f. and 5a, and 
df. and 1 — Sa. 
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. The requirement of a normal distribution is more impor- 


tant when testing a standard deviation than when testing 
a mean. If the population is not normal, the results of a 
y?-test can be misleading because the y7-test is not as 
robust as the tests for the population mean. 


5. 38.885 7. 0.872 9. 60.391, 101.879 
11. (a) Fail to reject Ap. (b) Fail to reject Hp. 

(c) Fail to reject Hp. (d) Reject Hp. 

13. (a) Fail to reject Hp. (b) Reject Hp. 


17. 


19. 


21. 


23. 


25. 


27. 


(c) Reject Hp. (d) Fail to reject Hp. 


. Fail to reject Hp. There is not enough evidence at the 5% 


level of significance to reject the claim. 

Reject H). There is enough evidence at the 10% level of 

significance to reject the claim. 

(a) Hy: 07 = 1.25 (claim); H,: 0? # 1.25 

(b) yz = 10.283, yz = 35.479; 
Rejection regions: y? < 10.283, y* > 35.479 

(c) 22.68  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the manufacturer’s claim that the 


variance of the number of grams of carbohydrates in 
servings of its tortilla chips is 1.25. 


(a) Hy: 0 = 36; H,: 0 < 36 (claim) 

(b) vG = 13.240; Rejection region: y? < 13.240 

(c) 18.076 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to support the test administrator’s claim 
that the standard deviation for eighth graders on the 
examination is less than 36 points. 

(a) Ho: o = 25 (claim); H,: 0 > 25 

(b) x6 = 36.741; Rejection region: x? > 36.741 

(c) 41.515  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of 
significance to reject the weather service’s claim that 


the standard deviation of the number of fatalities per 
year from tornadoes is no more than 25. 


(a) Ho: o = $3500; H,: 0 < $3500 (claim) 

(b) vj = 18.114; Rejection region: y* < 18.114 

(c) 37.051 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to support the insurance agent’s claim that 
the standard deviation of the total charges for patients 
involved in a crash in which the vehicle struck a 
construction barricade is less than $3500. 

(a) Hp: o = 6100; H,: ¢ > 6100 (claim) 

(b) v4 = 27.587; Rejection region: y* > 27.587 

(c) 27.897 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to support the claim that the standard 


deviation of the annual salaries of environmental 
engineers is greater than $6100. 
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29. Hypothesis test results: 


og’ : variance of Variable 


Ho ae 0 2 = 
Ha : o < 9 
Sample Chi-Square 
Variance Var. DF Stat P-value 
ce 2.03 9 2.03 0.009 


Reject Hy. There is enough evidence at the 1% level of 
significance to reject the claim. 


31. 0? = 4.57 = 20.25 
s? = 5.8? = 33.64 
Hypothesis test results: 


go” : variance of Variable 


Hy: 07 = 20.25 
Ha : 07 > 20.25 
Sample Chi-Square 
Variance Var. DF Stat P-value 
o 33.64 14 23.257284 0.0562 


Fail to reject Hp. There is not enough evidence at the 
5% level of significance to support the claim. 


33. P-value = 0.9059 35. P-value = 0.0462 
Fail to reject Hp. Reject Hp. 


Uses and Abuses for Chapter 7 (page 413) 


1. Answers will vary. 
2. Ho: p = 0.73; Answers will vary. 
3. Answers will vary. 


4. Answers will vary. 


Review Answers for Chapter 7 (page 417) 

. Ho: w = 375 (claim); H,: w > 375 

. Hp: p = 0.205; H,: p < 0.205 (claim) 

. Ay:o = 1.9; H,: 0 > 1.9 (claim) 

. (a) Ho: p = 0.71 (claim); H,: p # 0.71 

(b) A type I error will occur if the actual proportion of 

Americans who support plans to order deep cuts in 
executive compensation at companies that have 


received federal bailout funds is 71%, but you reject 
Hy: p = 0.71. 


aM Q = 


A type II error will occur if the actual proportion is not 


71%, but you fail to reject Ho: p = 0.71. 

(c) Two-tailed because the alternative hypothesis 
contains #. 

(d) There is enough evidence to reject the news outlet’s 
claim that the proportion of Americans who support 
plans to order deep cuts in executive compensation 


at companies that have received federal bailout funds 


is 71%. 


9. 


11. 
13. 


15. 


17. 


19. 
21. 
23. 


25. 


27. 


(e) There is not enough evidence to reject the news 
outlet’s claim that the proportion of Americans who 
support plans to order deep cuts in executive 
compensation at companies that have received federal 
bailout funds is 71%. 

(a) Hp: 0 = 50 (claim); H,: 0 > 50 

(b) A type I error will occur if the actual standard 
deviation of the sodium content in one serving of a 
certain soup is no more than 50 milligrams, but you 
reject Hy: 0 = 50. 

A type II error will occur if the actual standard 
deviation of the sodium content in one serving of a 
certain soup is more than 50 milligrams, but you fail to 
reject Hy: 0 = 50. 

(c) Right-tailed because the alternative hypothesis 
contains >. 

(d) There is enough evidence to reject the soup maker’s 
claim that the standard deviation of the sodium 
content in one serving of a certain soup is no more 
than 50 milligrams. 

(e) There is not enough evidence to reject the soup 
maker’s claim that the standard deviation of the 
sodium content in one serving of a certain soup is 
no more than 50 milligrams. 

0.1736; Fail to reject Hp. 

Ah: w = 0.05 (claim); H,: w > 0.05 

z = 2.20; P-value = 0.0139 

a = 0.10 = Reject Ap. 

a = 0.05 = Reject Ap. 

a = 0.01 = Fail to reject Ap. 


=2.05 
Ge 0 12 3 
1.96 
32-10 17 3 
2) = 1.96 
Fail to reject Hy because —1.645 < z < 1.645. 


Fail to reject Hy because —1.645 < z < 1.645. 


Reject Hp. There is enough evidence at the 5% level of 
significance to reject the claim. 


Fail to reject Hp. There is not enough evidence at the 
1% level of significance to support the claim. 


Fail to reject Hp. There is not enough evidence at the 

1% level of significance to reject the U.S. Department of 
Agriculture’s claim that the mean cost of raising a child 
from birth to age 2 by husband-wife families in rural areas 
is $10,380. 
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29. 
33. 


35. 


37. 


39. 


41. 


43. 


45. 


47. 
49. 


51. 


53. 
57. 


59. 


61. 


63. 
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—2.093, 2.093 31. —2.977 

Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to support the claim. 

Fail to reject Hp. There is not enough evidence at the 10% 
level of significance to reject the claim. 

Reject Hp. There is enough evidence at the 1% level of 
significance to reject the claim. 


Fail to reject Hp. There is not enough evidence at the 10% 
level of significance to reject the advertisement’s claim that 
the mean monthly cost of joining a health club is $25. 
There is not enough evidence at the 1% level of 
significance to reject the education publication’s claim that 
the mean expenditure per student in public elementary and 
secondary schools is at least $10,200. 

Can use normal distribution. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to reject the claim. 


Can use normal distribution. 

Fail to reject Hp. There is not enough evidence at the 8% 
level of significance to support the claim. 

Cannot use normal distribution. 

Can use normal distribution. 

Fail to reject Hp. There is not enough evidence at the 2% 
level of significance to support the claim. 


Reject Hy. There is enough evidence at the 2% level of 
significance to support the polling agency’s claim that over 
16% of US. adults are without health care coverage. 


30.144 55. 63.167 

Reject Hp. There is enough evidence at the 10% level of 
significance to support the claim. 

Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to reject the claim. 


Reject Hp. There is enough evidence at the 0.5% level of 
significance to reject the bolt manufacturer’s claim that the 
variance is at most 0.01. 


You can reject Hy at the 5% level of significance because 
x? = 43.94 > 41.923. 


Chapter Quiz for Chapter 7 (page 421) 


1. 


(a) Ho: w = 170 (claim); H,: w < 170 

(b) One-tailed because the alternative hypothesis 
contains <; z-test because n = 30 

(c) Zz = —1.88; Rejection region: z < —1.88 

(d) —2.59 

(e) Reject Ap. 

(f) There is enough evidence at the 3% level of 
significance to reject the service’s claim that the mean 


consumption of vegetables and melons by people in the 
United States is at least 170 pounds per person. 
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2. (a) Ao: w = 7.25 (claim); Hy: w < 7.25 

(b) One-tailed because the alternative hypothesis 
contains <; t-test because n < 30,0 is unknown, and 
the population is normally distributed 

(c) t = —1.796; Rejection region: t < —1.796 

(d) —1.283 

(e) Fail to reject A. 

(f) There is not enough evidence at the 5% level of 
significance to reject the company’s claim that the 
mean hat size for a male is at least 7.25. 

3. (a) Ho: p = 0.10 (claim); H,: p > 0.10 

(b) One-tailed because the alternative hypothesis 
contains >; z-test because np > 5 andnq > 5 

(c) z = 1.75; Rejection region: z > 1.75 

(d) 0.75 

(e) Fail to reject Ho. 

(f) There is not enough evidence at the 4% level of 
significance to reject the microwave oven maker’s 
claim that no more than 10% of its microwaves need 
repair during the first 5 years of use. 

4. (a) Hy: a = 112 (claim); H,: 0 # 112 

(b) Two-tailed because the alternative hypothesis contains 
#; x’-test because the test is for a standard deviation 
and the population is normally distributed 

(c) x7 = 9.390, xR = 28.869; 

Rejection regions: y? < 9.390, x7 > 28.869 

(d) 29.343 

(e) Reject Hp. 

(f) There is enough evidence at the 10% level of 
significance to reject the state school administrator’s 
claim that the standard deviation of SAT critical 
reading test scores is 112. 

5. (a) Hp: w = $62,569 (claim); H,: w # $62,569 

(b) Two-tailed because the alternative hypothesis 


contains #; t-test because n < 30, 0 is unknown, 
and the population is normally distributed 


(c) Not necessary (d) —2.175; 0.0473 


(e) Reject Hp. 

(f) There is enough evidence at the 5% level of 
significance to reject the agency’s claim that the mean 
income for full-time workers ages 25 to 34 with a 
master’s degree is $62,569. 

6. (a) Ho: w = $201 (claim); H,: w # $201 

(b) Two-tailed because the alternative hypothesis 
contains #; z-test because n = 30 

(c) Not necessary 

(d) 0.0030 (Tech: 0.0031) 

(e) Reject Hp. 

(f) There is enough evidence at the 5% level of 
significance to reject the tourist agency’s claim that 


the mean daily cost of meals and lodging for a family 
of 4 traveling in the state of Kansas is $201. 
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Real Statistics—Real Decisions for Chapter 7 


1. 
2. 
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(page 422) 
(a)-(c) Answers will vary. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to support PepsiCo’s claim that more 
than 50% of cola drinkers prefer Pepsi® over Coca-Cola’®. 


. Knowing the brand may influence participants’ decisions. 


. (a)-(c) Answers will vary. 


CHAPTER 8 


Section 8.1 (page 434) 


1. 


15. 


17. 


19. 
21. 


23. 


Two samples are dependent if each member of one sample 
corresponds to a member of the other sample. Example: 
The weights of 22 people before starting an exercise 
program and the weights of the same 22 people 6 weeks 
after starting the exercise program. 


Two samples are independent if the sample selected from 
one population is not related to the sample selected from 
the other population. Example: The weights of 25 cats and 
the weights of 20 dogs. 


. Use P-values. 
. Independent because different students were sampled. 


. Dependent because the same football players were 


sampled. 


. Independent because different boats were sampled. 
. Dependent because the same tire sets were sampled. 


. (a) 2 


(b) 2.95 

(d) Reject Hp. There is enough evidence at the 1% level of 
significance to reject the claim. 

(a) 3 (b) 0.18 

(d) Fail to reject Hp. There is not enough evidence at the 
5% level of significance to support the claim. 


(c) In the rejection region. 


(c) Not in the rejection region. 


Fail to reject Hp. There is not enough evidence at the 
1% level of significance to support the claim. 


Reject Hp. 

(a) The claim is “the mean braking distances are different 
for the two types of tires.” 
Af: by = #2; Ha: by # M2 (claim) 

(b) —z) = —1.645, z = 1.645; 
Rejection regions: z < —1.645, z > 1.645 

(c) —2.786  (d) Reject Hp. 

(e) There is enough evidence at the 10% level of 
significance to support the safety engineer’s claim that 


the mean braking distances are different for the two 
types of tires. 


(a) The claim is “Region A’s average wind speed is greater 
than Region B’s.” 


Aly: by S bas Ag: Wy > 2 (claim) 
(b) Z = 1.645; Rejection region: z > 1.645 
(c) 1.53 (d) Fail to reject Hp. 
(e) There is not enough evidence at the 5% level of 


significance to conclude that Region A’s average wind 
speed is greater than Region B’s. 


25. 


27. 


29. 


31. 


33. 


(a) The claim is “male and female high school students 
have equal ACT scores.” 
A: by = 2 (claim); Ay: wy # 2 

(b) —Z = —2.575, Z = 2.575; 
Rejection regions: z < —2.575, z > 2.575 

(c) 0.202 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 


significance to reject the claim that male and female 
high school students have equal ACT scores. 


(a) The claim is “the average home sales price in Dallas, 
Texas is the same as in Austin, Texas.” 
A: 4) = be (claim); Hg: wa # M2 

(b) —Z = —1.645, z = 1.645; 
Rejection regions: z < —1.645, z > 1.645 

(c) —1.30 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to reject the real estate agency’s claim that 


the average home sales price in Dallas, Texas is the 
same as in Austin, Texas. 


(a) The claim is “the average home sales price in Dallas, 

Texas is the same as in Austin, Texas.” 
Ho: by = 2 (claim); Ha: ay * M2 

(b) =£0 — —1.645, Zo = 1.645; 

Rejection regions: z < —1.645, z > 1.645 

(c) 1.86 (d) Reject Hp. 

(ce) There is enough evidence at the 10% level of 
significance to reject the real estate agency’s claim that 
the average home sales price in Dallas, Texas is the 
same as in Austin, Texas. 

The new samples do lead to a different conclusion. 

(a) The claim is “children ages 6-17 spent more time 
watching television in 1981 than children ages 6-17 
do today.” 

A: by = 2; Ag: py > 2 (claim) 

(b) zy = 1.96; Rejection region: z > 1.96 

(c) 3.01 (d) Reject Hp. 

(e) There is enough evidence at the 2.5% level of 
significance to support the sociologist’s claim that 


children ages 6-17 spent more time watching 
television in 1981 than children ages 6-17 do today. 


(a) The claim is “there is no difference in the mean 
washer diameter manufactured by two different 
methods.” 

A: a1 = 2 (claim); Hy: a1 * M2 

(b) =£0: = —2.575, re a 2.575: 

Rejection regions: z < —2.575, z > 2.575 

(c) 64.978 (d) Reject Hp. 

(ec) There is enough evidence at the 1% level of 
significance to reject the production engineer’s claim 


that there is no difference in the mean washer diameter 
manufactured by two different methods. 
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35. 


37. 


39, 


41. 


43, 


45. 
47. 
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They are equivalent through algebraic manipulation of the 
equation. 


My = bo oy — HW = 0 
Hypothesis test results: 


#4, : mean of population 1 (Std. Dev. = 5.4) 
#2: mean of population 2 (Std. Dev. = 7.5) 
/41 — 2: mean difference 

Ho: e1 — 2 = 0 

Hai py — b2 #0 


Difference ny | D2 | Sample Mean 
M1 — Bo 50 | 45 4 
Std. Err. Z-Stat P-value 
1.3539572 | 2.9543033 0.0031 


P = 0.0031 < 0.01, so reject Hp. 


There is enough evidence at the 1% level of significance to 
support the claim. 


Hypothesis test results: 

#4, : mean of population 1 (Std. Dev. = 0.92) 
#2: mean of population 2 (Std. Dev. = 0.73) 
-4 — Py: mean difference 

Ho: 41 — hy = 0 

Hai pi — 2 <0 


Difference ny | Ny | Sample Mean 
Mi — Be 35 | 40 —0.32 
Std. Err. Z-Stat P-value 
0.193663 —1.6523548 0.0492 


P = 0.0492 < 0.05, so reject Hp. 


There is enough evidence at the 5% level of significance to 
reject the claim. 

Ay: wy — Ba = —9 (claim); Hy: 1 — be # —9 

Fail to reject Hp. 

There is not enough evidence at the 1% level of 
significance to reject the claim that children spend 9 hours 
a week more in day care or preschool today than in 1981. 
Ah: Mi — B2 = 10,000; foie Mi B2 > 10,000 (claim) 
Reject Hp. There is enough evidence at the 5% level of 
significance to support the claim that the difference in 
mean annual salaries of microbiologists in Maryland and 
California is more than $10,000. 

—3.6 < py — py < —0.2 

A: by = 7s Ag: ey < 2 (claim) 

Reject Hp. There is enough evidence at the 5% level of 
significance to support the claim. You should recommend 
the DASH diet and exercise program over the traditional 
diet and exercise program because the mean systolic blood 
pressure was significantly lower in the DASH program. 
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49. 


The 95% CI for , — py in Exercise 45 contained only 
values less than 0 and, as found in Exercise 47, there was 
enough evidence at the 5% level of significance to support 
the claim. 


If the CI for 4; — 2 contains only negative numbers, you 
reject Hy because the null hypothesis states that 4, — p2 is 
greater than or equal to 0. 


Section 8.2 (page 446) 


1. 


(1) The samples must be randomly selected. 
(2) The samples must be independent. 


(3) Each population must have a normal distribution. 


. (a) —tp = -1.714, tp = 1.714 


(b) —tp = —1.812, tf = 1.812 


5. (a) to = —1.746 
(b) t = —1.943 

7. (a) to = 1.729 
(b) fo = 1.895 

9. (a) —1.8 (b) —1.70 
(c) Not in the rejection region. 
(d) Fail to reject Hp. 

11. (a) 105 (b) 2.05 

(c) In the rejection region. 

(d) Reject Hp. 

13. (a) The claim is “the mean annual costs of routine 
veterinarian visits for dogs and cats are the same.” 
Aly: #4 = fz (claim); Hy: wi * be 

(b) —f = —1.943, t) = 1.943; 

Rejection regions: t < —1.943, t > 1.943 

(c) 1.90 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to reject the pet association’s claim that the 
mean annual costs of routine veterinarian visits for 
dogs and cats are the same. 

15. (a) The claim is “the mean bumper repair cost is less for 
mini cars than for midsize cars.” 
Ay: fy = bos Ay: by < fy (claim) 

(b) %& = —1.325; Rejection region: t < —1.325 

(c) —0.93 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 10% level of 
significance to support the claim that the mean bumper 
repair cost is less for mini cars than for midsize cars. 

17. (a) The claim is “the mean household income is greater in 


Allegheny County than it is in Erie County.” 
Aly: by = bys Hy 1 > M2 (claim) 

(b) % = 1.761; Rejection region: t > 1.761 

(c) 1.99 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to support the personnel director’s claim 


that the mean household income is greater in 
Allegheny County than it is in Erie County. 
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19. 


21. 


23. 


25. 
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(a) The claim is “the new treatment makes a difference in 
the tensile strength of steel bars.” 
Ay: @y = be; Hy: by # My (claim) 

(b) —t) = —2.831, to = 2.831; 
Rejection regions: ¢ < —2.831, ¢ > 2.831 

(c) —2.76 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of signifi- 


cance to support the claim that the new treatment 
makes a difference in the tensile strength of steel bars. 


(a) The claim is “the new method of teaching reading 
produces higher reading test scores than the old method.” 


Ao: | = fbr; Hy: by < 2 (claim) 

(b) t) = —1.282; Rejection region: t < —1.282 

(c) —4.295  (d) Reject Ap. 

(e) There is enough evidence at the 10% level of 
significance to support the claim that the new method 
of teaching reading produces higher reading test scores 


than the old method and to recommend changing to 
the new method. 


Hypothesis test results: 
4: mean of population 1 
2: mean of population 2 
41 — P2: mean difference 
Ho: 41 — 42 = 0 

Ha: 1 — P2 > 0 


(with pooled variances) 


Difference Sample Mean Std. Err. 
M1 — M2 =§ 16.985794 
DF T-Stat P-value 

22 | —0.47098184 | 0.6789 


P = 0.6789 > 0.10, so fail to reject Hp. 


There is not enough evidence at the 10% level of 
significance to support the claim. 


Hypothesis test results: 
4: mean of population 1 
2: mean of population 2 
1 — P2: mean difference 
Ho: 1 — 42 = 0 

Hay 2 1 — 2 < 0 


(without pooled variances) 


Difference | Sample Mean Std. Err. 
M1 — M2 —43 | 28.12301 
DF T-Stat P-value 

18.990595 —1.5289971 0.0714 


P = 0.0714 > 0.05, so fail to reject Hp. 


There is not enough evidence at the 5% level of 
significance to reject the claim. 


27. 


45 < py — by < 307 29, 11 < wy, - wy < 35 


Section 8.3 (page 456) 


1. 


Sr Mm w 


11. 


13. 


15. 


(1) Each sample must be randomly selected. 


(2) Each member of the first sample must be paired with a 
member of the second sample. 


(3) Both populations must be normally distributed. 


. Left-tailed test; Fail to reject Ho. 
. Right-tailed test; Reject Hp. 
. Left-tailed test; Reject Hp. 


. (a) The claim is “a grammar seminar will help students 


reduce the number of grammatical errors.” 
Alo: wa = 0; Hy: wa > O (claim) 

(b) to = 3.143; Rejection region: t > 3.143 

(c) d © 3.143; sy © 2.035 

(d) 4.085  (e) Reject Hp. 

(f) There is enough evidence at the 1% level of 
significance to support the teacher’s claim that a 


grammar seminar will help students reduce the number 
of grammatical errors. 


(a) The claim is “a particular exercise program will help 
participants lose weight after one month.” 


Ao: ta = 9; Hy: wa > 0 (claim) 

(b) to = 1.363; Rejection region: t > 1.363 

(c) d = 3.75; sq © 7.841 

(d) 1.657 (e) Reject Hp. 

(f) There is enough evidence at the 10% level of 
significance to support the nutritionist’s claim that the 


exercise program helps participants lose weight after 
one month. 


(a) The claim is “soft tissue therapy and spinal 
manipulation help to reduce the length of time patients 
suffer from headaches.” 


Ao: tg = 0; Hy: wa > O (claim) 

(b) t = 2.764; Rejection region: t > 2.764 

(c) d © 1.255; sq © 0.441 

(d) 9.429 (e) Reject Hp. 

(f) There is enough evidence at the 1% level of 
significance to support the physical therapist’s claim 
that soft tissue therapy and spinal manipulation help 


reduce the length of time patients suffer from 
headaches. 


(a) The claim is “the new drug reduces systolic blood 
pressure.” 


Ao: pa = 0; Hy: wa > O (claim) 
(b) fo = 1.895; Rejection region: f > 1.895 
(c) d = 14.75; sy ~ 6.861 
(d) 6.081 (e) Reject Ap. 
(f) There is enough evidence at the 5% level of 


significance to support the pharmaceutical company’s 
claim that its new drug reduces systolic blood pressure. 
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17. (a) The claim is “the product ratings have changed from 
last year to this year.” 


Ao: ba = 0; Hy: ba # O (claim) 
(b) —to = —2.365, to = 2.365 
Rejection regions: t < —2.365, t > 2.365 
(c) d = -1; sg © 1.309 
(d) —2.160  (e) Fail to reject Hp. 
(f) There is not enough evidence at the 5% level of 


significance to support the claim that the product 
ratings have changed from last year to this year. 


19. Hypothesis test results: 


4, — #2: mean of the paired difference between 
Cholesterol (before) and Cholesterol (after) 


Ho: 41 — #2 = 0 
Ha: ph — 2 > 0 


Difference Sample Diff. 


Cholesterol (before) — Cholesterol (after) 2.857143 


Std. Err. DF T-Stat P-value 


1.6822401 6 | 1.6984155 0.0702 


P = 0.0702 > 0.05, so fail to reject Hp. 


There is not enough evidence at the 5% level of 


significance to support the claim that the new cereal lowers 


total blood cholesterol levels. 
21. Yes; P ~ 0.0003 < 0.05, so you reject Hp. 
23. —1.76 < pg < —1.29 


Section 8.4 (page 465) 


1. (1) The samples must be randomly selected. 
(2) The samples must be independent. 
(3) mp = 5,mq = 5, mp = 5, and nq = 5 


3. Can use normal sampling distribution; Fail to reject Hp. 
5. Can use normal sampling distribution; Reject Ho. 


7. Can use normal sampling distribution; Fail to reject Hp. 


9. (a) The claim is “there is a difference in the proportion of 


subjects who feel all or mostly better after 4 weeks 
between subjects who used magnetic insoles and 
subjects who used nonmagnetic insoles.” 


A: Py = prs Hy: py # pz (claim) 

(b) —Zy = —2.575, Z = 2.575; 

Rejection regions: z < —2.575, z > 2.575 

(c) —1.24 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the claim that there is a 
difference in the proportion of subjects who feel all 
or mostly better after 4 weeks between subjects who 


used magnetic insoles and subjects who used 
nonmagnetic insoles. 
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11. (a) The claim is “the proportion of males who enrolled in 
college is less than the proportion of females who 
enrolled in college.” 


Ao: Pi = Pos Ay: pi < po (claim) 

(b) z = —1.645; Rejection region: z < —1.645 

(c) —4.22 (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to support the claim that the proportion 


of males who enrolled in college is less than the 
proportion of females who enrolled in college. 


13. (a) The claim is “the proportion of subjects who are 
pain-free is the same for the two groups.” 


A: py = Pz (claim); H,: p, # pz 

(b) —Zp = —1.96, zo = 1.96; 

Rejection regions: z < —1.96, z > 1.96 

(c) 5.62 (Tech: 5.58) (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to reject the claim that the proportion of 
subjects who are pain-free is the same for the two 
groups. 

15. (a) The claim is “the proportion of motorcyclists who wear 
a helmet is now greater.” 


A: py = pr; Ag: pi > pz (claim) 
(b) z = 1.645; Rejection region: z > 1.645 
(c) 1.37. (d) Fail to reject Ap. 
(e) There is not enough evidence at the 5% level of 


significance to support the claim that the proportion 
of motorcyclists who wear a helmet is now greater. 


17. (a) The claim is “the proportion of Internet users is the 
same for the two age groups.” 


Ho: p, = P2 (claim); H,: p, # pr 
(b) —z = —2.575, Z = 2.575; 

Rejection regions: z < —2.575, z > 2.575 
(c) 5.31 (d) Reject Ap. 
(e) There is enough evidence at the 1% level of 


significance to reject the claim that the proportion of 
Internet users is the same for the two age groups. 


19. There is enough evidence at the 5% level of significance to 
reject the claim that the proportion of customers who wait 
20 minutes or less is the same at the Fairfax North and 
Fairfax South offices. 


21. There is enough evidence at the 10% level of significance 
to support the claim that the proportion of customers who 
wait 20 minutes or less at the Roanoke office is less than 
the proportion of customers who wait 20 minutes or less at 
the Staunton office. 


23. No; When a = 0.01, the rejection region becomes 
zZ < —2.33. Because —2.02 > —2.33, you fail to reject Hp. 
There is not enough evidence at the 1% level of 
significance to support the claim that the proportion of 
customers who wait 20 minutes or less at the Roanoke 
office is less than the proportion of customers who wait 
20 minutes or less at the Staunton office. 
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25. Hypothesis test results: 
Pi: proportion of successes for population 1 
P2 : proportion of successes for population 2 
P1 — Po: difference in proportions 
Ho: pi — p2 = 9 
Ha: Pi — P2 > 0 


Difference | Countl | Totall | Count2 | Total2 
Pi — P2 7501 13300 8120 14500 
Sample Diff. Std. Err. Z-Stat P-value 
0.0039849626 | 0.0059570055 | 0.66895396 0.2518 


P = 0.2518 > 0.05, so fail to reject Ap. 


There is not enough evidence at the 5% level of 
significance to support the claim that the proportion of 
men ages 18 to 24 living in their parents’ homes was 
greater in 2000 than in 2009. 


27. Hypothesis test results: 
Pi : proportion of successes for population 1 
P2 : proportion of successes for population 2 
P1 — P2: difference in proportions 
Ho: pi — p2 = 9 
Ha: pi — po # 0 


Difference Countl Totall Count2 Total2 

Pi — Pp2 7501 13300 5610 13200 

Sample Diff. Std. Err. Z-Stat P-value 
0.13898496 0.006142657 | 22.626196 | <0.0001 


P < 0.0001 < 0.01, so reject Ah. 


There is enough evidence at the 1% level of significance to 
reject the claim that the proportion of 18- to 24-year-olds 
living in their parents’ homes in 2000 was the same for men 
and women. 


29. —0.028 < p, — py < —0.012 


Uses and Abuses for Chapter 8 (page 469) 


1. Answers will vary. 


2. Blind: The patients do not know which group (medicine or 
placebo) they belong to. 


Double Blind: Both the researcher and patient do not 
know which group (medicine or placebo) that the patient 
belongs to. 


Review Answers for Chapter 8 (page 471) 


1. Dependent because the same cities were sampled. 

3. Fail to reject Hy. There is not enough evidence at the 5% 
level of significance to reject the claim. 

5. Reject Hy. There is enough evidence at the 10% level of 
significance to support the claim. 


7. 


15. 


17. 


19. 
21. 
23. 


25. 
27. 
29. 


(a) The claim is “the Wendy’s fish sandwich has less 
sodium than the Long John Silver’s fish sandwich.” 
Ay: @y = fo Ag: by < 2 (claim) 

(b) —z = —1.645; Rejection region: z < —1.645 

(c) —9.20 (d) Reject A. 

(e) There is enough evidence at the 5% level of 
significance to support the claim that the Wendy’s fish 


sandwich has less sodium than the Long John Silver’s 
fish sandwich. 


. Yes; The new rejection region is z < —2.33, which contains 


Zz = —9.20, so you still reject Hp. 


. Reject Hp. There is enough evidence at the 5% level of 


significance to reject the claim. 


. Fail to reject Hp. There is not enough evidence at the 5% 


level of significance to reject the claim. 


Reject Hp. There is enough evidence at the 1% level of 
significance to support the claim. 


(a) The claim is “third graders taught with the directed 
reading activities scored higher than those taught 
without the activities.” 


Ao: hy = or; Ay: by > be (claim) 

(b) t = 1.645; Rejection region: t > 1.645 

(c) 2.267 (d) Reject Hp. 

(ce) There is enough evidence at the 5% level of 
significance to support the claim that third graders 


taught with the directed reading activities scored higher 
than those taught without the activities. 


Two-tailed test; Reject Hp. 

Right-tailed test; Reject Hp. 

(a) The claim is “the men’s systolic blood pressure 
decreased.” 
Ao: pa = 0; Hy: wa > O (claim) 

(b) t = 1.383; Rejection region: t > 1.383 

(c) d = 5; sy © 8.743 (d) 1.808 (e) Reject Ap. 

(f) There is enough evidence at the 10% level of 


significance to support the claim that the men’s 
systolic blood pressure decreased. 


Can use normal sampling distribution; Fail to reject Ho. 
Can use normal sampling distribution; Reject Hp. 


(a) The claim is “the proportions of U.S. adults who 
considered the amount of federal income tax they had 
to pay to be too high were the same for the two years.” 


Ao: pi = pz (claim); H,: py * p2 

(b) —Zo =2)'579, Zo = 2.575; 
Rejection regions: z < —2.575, z > 2.575 

(c) 2.65 (d) Reject A. 

(ec) There is enough evidence at the 1% level of 
significance to reject the claim that the proportions 
of U.S. adults who considered the amount of federal 


income tax they had to pay to be too high were the 
same for the two years. 


A91 


Presented by: https://jafrilibrary.org 


Presented by: https://jafrilibrary.org 


31. Yes; When a = 0.05, the rejection regions become 


z < —1.96 and z > 1.96. Because 2.65 > 1.96, you still 
reject Hy. There is enough evidence at the 5% level of 
significance to reject the claim that the proportions of 
US. adults who considered the amount of federal income 
tax they had to pay to be too high were the same for the 
two years. 


Chapter Quiz for Chapter 8 (page 475) 
1. (a) Ao: wy S bos Ag: my > be (claim) 


(b) One-tailed because H, contains >; z-test because n, 
and np are each greater than 30. 

(c) Z = 1.645; Rejection region: z > 1.645 

(d) 0.585 (e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance to support the claim that the mean score 
on the science assessment for the male high school 
students was higher than for the female high school 
students. 


~ (a) Ao: wi = M2 (claim); H,: wy # pb 

(b) Two-tailed because H, contains #; t-test because n, 
and nz are less than 30, the samples are independent, 
and the populations are normally distributed. 

(c) to = —2.779, to = 2.779; 

Rejection regions: tf < —2.779, t > 2.779 

(d) 0.341 = (e) Fail to reject Hp. 

(f) There is not enough evidence at the 1% level of 
significance to reject the teacher’s claim that the mean 
scores on the science assessment test are the same for 
fourth grade boys and girls. 

» (a) Ao: py = pz (claim); A;: py # P2 

(b) Two-tailed because H, contains #; z-test because you 
are testing proportions and np, 1, q, Np, and 
Noq =5. 

(c) —Z = 1.645, zp = 1.645; 

Rejection regions: z < —1.645, z > 1.645 

(d) 1.32 (e) Fail to reject Hp. 

(f) There is not enough evidence at the 10% level of 
significance to reject the claim that the proportion of 
US. adults who are worried that they or someone in 
their family will become a victim of terrorism has not 
changed. 

. (a) Ho: wa = 0; Ay: wa < 0 (claim) 

(b) One-tailed because H, contains <; t-test because both 
populations are normally distributed and the samples 
are dependent. 

(c) t = —2.718; Rejection region: tf < —2.718 

(d) —5.07 (e) Reject Ap. 

(f) There is enough evidence at the 1% level of 


significance to support the claim that the seminar helps 
adults increase their credit scores. 
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Real Statistics-Real Decisions for Chapter 8 (page 476) 


1. 


Cumulative Review Chapters 6-8 
1. 


NAHM Bb WwW 


13. 


. (a) Answers will vary. 


(a) Answers will vary. Sample answer: Divide the records 
into groups according to the inpatients’ ages, and then 
randomly select records from each group. 


(b) Answers will vary. Sample answer: Divide the records 
into groups according to geographic regions, and then 
randomly select records from each group. 


(c) Answers will vary. Sample answer: Assign a different 
number to each record, randomly choose a starting 
number, and then select every 50th record. 


(d) Answers will vary. Sample answer: Assign a different 
number to each record, and then use a table of random 
numbers to generate a sample of numbers. 


(b) Answers will vary. 


. Use a t-test; independent; yes, you need to know if the 


population distributions are normal or not; yes. you need 
to know if the population variances are equal or not. 


. There is not enough evidence at the 10% level of 


significance to support the claim that there is a difference 
in the mean length of hospital stays for inpatients. 


This decision does not support the calim. 


(page 480) 


(a) (0.109, 0.151) 


(b) There is enough evidence at the 5% level of 
significance to support the researcher’s claim that more 
than 10% of people who attend community college are 
age 40 or older. 


. There is enough evidence at the 10% level of significance 


to support the claim that the fuel additive improved gas 
mileage. 


. (25.94, 28.00); z-distribution 
. (2.75, 4.17); t-distribution 
- (10.7, 13.5); t-distribution 
. (7.69, 8.73); t-distribution 


. There is enough evidence at the 10% level of significance 


to support the pediatrician’s claim that the mean birth 
weight of a single-birth baby is greater than the mean birth 
weight of a baby that has a twin. 


. Ho: w = 33; H,: w < 33 (claim) 

. Hy: p = 0.19 (claim); H,: p < 0.19 
10. 
11. 
. (a) (5.1, 22.8) 


Hp: 0 = 0.63 (claim); H,: 0 # 0.63 
Ao: wp = 2.28; H,: w # 2.28 (claim) 
(b) (2.3, 4.8) 


(c) There is not enough evidence at the 1% level of 
significance to support the pharmacist’s claim that the 
standard deviation of the mean number of chronic 
medications taken by elderly adults in the community 
is less than 2.5 medications. 


There is enough evidence at the 5% level of significance to 
support the organization’s claim that the mean SAT scores 
for male athletes and male non-athletes at a college are 
different. 
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14. (a) (37,732.2, 40,060.7) 


(b) There is not enough evidence at the 5% level of 
significance to reject the claim that the mean annual 
earnings for translators is $40,000. 


15. There is not enough evidence at the 10% level of 
significance to reject the claim that the proportions of 
players sustaining head and neck injuries are the same 
for the two groups, 


16. (a) (41.5, 42.5) 


(b) There is enough evidence at the 5% level of 
significance to reject the zoologist’s claim that the 
mean incubation period for ostriches is at least 45 days. 


CHAPTER 9 


Section 9.1 (page 495) 


1. Increase 


3. The range of values for the correlation coefficient is —1 
to 1, inclusive. 


5. Answers will vary. Sample answer: 


Perfect positive linear correlation: price per gallon of 
gasoline and total cost of gasoline 


Perfect negative linear correlation: distance from door and 
height of wheelchair ramp 


7. ris the sample correlation coefficient, while p is the 
population correlation coefficient. 


9. Negative linear correlation 
11. Perfect negative linear correlation 
13. Positive linear correlation 


15. c; You would expect a positive linear correlation between 
age and income. 


16. d; You would not expect age and height to be correlated. 


17. b; You would expect a negative linear correlation between 
age and balance on student loans. 


18. a; You would expect the relationship between age and body 
temperature to be fairly constant. 


19. Explanatory variable: Amount of water consumed 
Response variable: Weight loss 
21. (a) 7 


Systolic blood pressure 
3 
t 


Ps 
10 20 30 40 50 60 70 


Age (in years) 
(b) 0.908 


(c) Strong positive linear correlation 


25. 


27. 


29. 


31. 


33. 


35. 


37. 


(a) i 
100+ 
e 8 e 
al t. 
3 60+ ° 
— e 
& 409 © Zi 
20+ 
x 
2 4 6 8 
Hours studying 
(b) 0.923 (c) Strong positive linear correlation 
(a) 
5 
i 
no] 
2S 
So 
Os 
= 
& 
200 220 240 260280300 
Budget 
(in millions of dollars) 
(b) 0.604 (c) Weak positive linear correlation 
y 
(a) y 
2 2.50-+ ° 
ZB 2.00-++ 
5 e 
& 1.50-- 
3 ° 
5 1.00-- s 
z e 
Z 050+ cee 
a e 
#4 ++ ++» x 
1.00 2.00 3.00 4.00 5.00 6.00 
Earnings per share 
(b) 0.828 (c) Strong positive linear correlation 
The correlation coefficient becomes r ~ 0.621. The new 


data entry is an outlier, so the linear correlation is weaker. 


There is not enough evidence at the 1% level of 
significance to conclude that there is a significant linear 
correlation between vehicle weight and the variability in 
braking distance. 

There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 
between the number of hours spent studying for a test and 
the score received on the test. 

There is enough evidence at the 1% level of significance 
to conclude that there is a significant linear correlation 
between earnings per share and dividends per share. 


(a) Dern 
307 
257 
207 


157 


4 5 6 7 
Magnitude, x 


(b) 0.848 
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(c) Reject Hp. There is enough evidence at the 1% level of (c) It is not meaningful to predict the value of y for 
significance to conclude that there is a significant linear x = 13 because x = 13 is outside the range of the 
correlation between the magnitudes of earthquakes original data. 
and their depths below the surface at the epicenter. (d) 68 

39. The correlation coefficient becomes r ~ 0.085. The new 21. } = 2.472x + 80.813 

rejection regions are t < —3.499 and t > 3.499, and the P 

new standardized test statistic is f © 0.227. So, you now fail 4 


to reject Hp. 


41. 0.883; 0.883; The correlation coefficient remains unchanged 
when the x-values and y-values are switched. 


43. Answers will vary. 


Sodium (in milligrams) 
= 
i=] 
o 
r 
t 


Activity 9.1 (page 500) Aa a 
1-4. A ill Calories 

eae eee ner seway (a) 501.053 milligrams 
Section 9.2 (page 505) (b) 328.013 milligrams 


1. A residual is the difference between the observed y-value tee mera 


of a data point and the predicted y-value on the regression (d) It is not meaningful to predict the value of y for 
line for the x-coordinate of the data point. A residual is x = 210 because x = 210 is outside the range of the 
positive when the data point is above the line, negative original data. 
when the point is below the line, and zero when the 23. py = 1.870x + 51.360 
observed y-value equals the predicted y-value. y 
3. Substitute a value of x into the equation of a regression a 
line and solve for y. g 
5. The correlation between variables must be significant. : 
2b 8&a %e Wc Wf Wd = 
13. c 14. b 15. a 16. d aa 
17. y = 0.065x + 0.465 89 121 
y Shoe size 
(a) 72.865 inches 
& (b) 66.32 inches 
2 (c) It is not meaningful to predict the value of y for 
= x = 15.5 because x = 15.5 is outside the range of 
FE the original data. 
= (d) 70.06 inches 
450 a See oTT sore 7 25. Strong positive linear correlation; As the years of 
Height (in feet) experience of the registered nurses increase, their salaries 
(a) 52 stories (b) 49 stories tend to increase. 
(c) It is not meaningful to predict the value of y for 27. No, it is not meaningful to predict a salary for a registered 


nurse with 28 years of experience because x = 28 is 


x = 400 because x = 400 is outside the range of the ‘ ie 
outside the range of the original data. 


original data. 
(d) 41 stories 29. Answers will vary. Sample answer: Although it is likely that 
. there is a cause-and-effect relationship between a registered 
19. y = 7.350x + 34.617 nurse’s years of experience and salary, you cannot use 
1 significant correlation to claim cause and effect. The 
relationship between the variables may also be influenced 
by other factors, such as work performance, level of 
education, or the number of years with an employer. 


31. (a) $ = —0.159x + 5.827 
(b) —0.852 


Test score 


2 4 6 8 
Hours studying 


(a) 57 (b) 82 
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(c) Fitted line plot 


Earned run average, y 


3.57 


' ' A ' ' 
10 12 14 16 18 
Wins, x 


33. (a) 5 


—4.297x + 94.200 


Row 2 


123 4 5 6 7 
Row | 


(b) } = —O.141x + 14.763 


Row | 


60 70 80 90 100 
Row 2 


(c) The slope of the line keeps the same sign, but the 
values of m and b change. 


35. (a) § = 0.139x + 21.024 
(b) {| 


(d) The residual plot shows a pattern because the 
residuals do not fluctuate about 0. This implies that 
the regression line is not a good representation of the 
relationship between the two variables. 


37. (a) ” 


39. 


41. 


43. 


45. 


47. 


n ao 


1020 30 4050 

(b) The point (44, 8) may be an outlier. 

(c) The point (44, 8) is not an influential point because the 
slopes and y-intercepts of the regression lines with the 
point included and without the point included are not 
significantly different. 


y = 654.536x — 1214.857 


Number of bacteria 


1234567 
Number of hours 


y = 93,028(1.712)* 


5000 


0 8 


y = —78.929x + 576.179 


A+ e 


n 
T 
12 3 


y = 782.300x 17! 


750 


50 


y = 25.035 + 19.599 In x 
716 

& 

= 


Shoe size 
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49. The logarithmic equation is a better model for the data. 
The graph of the logarithmic equation fits the data better 
than the regression line. 


Activity 9.2 (page 511) 


1-4. Answers will vary. 


Section 9.3. (page 519) 


1. The total variation is the sum of the squares of the 
differences between the y-values of each ordered pair 
and the mean of the y-values of the ordered pairs, or 
Z(H - y). 

The unexplained variation is the sum of the squares of 
the differences between the observed y-values and the 
predicted y-values, or ©\(y; — 3;)*. 


» 


5. Two variables that have perfect positive or perfect negative 
linear correlation have a correlation coefficient of 1 or —1, 
respectively. In either case, the coefficient of determination 


is 1, which means that 100% of the variation in the 
response variable is explained by the variation in the 
explanatory variable. 

7. 0.216; About 21.6% of the variation is explained. About 
78.4% of the variation is unexplained. 

9. 0.916; About 91.6% of the variation is explained. About 
8.4% of the variation is unexplained. 


11. (a) 0.798; About 79.8% of the variation in proceeds can be 
explained by the variation in the number of issues, and 


about 20.2% of the variation is unexplained. 


(b) 8064.633; The standard error of estimate of the 
proceeds for a specific number of issues is about 
$8,064,633,000. 


13. (a) 0.981; About 98.1% of the variation in sales can be 


explained by the variation in the total square footage, 


and about 1.9% of the variation is unexplained. 


(b) 30.576; The standard error of estimate of the sales for a 


specific total square footage is about $30,576,000,000. 


(a) 0.963; About 96.3% of the variation in wages for 
federal government employees can be explained by 


15 


the variation in wages for state government employees, 


and about 3.7% of the variation is unexplained. 


(b) 20.090; The standard error of estimate of the average 
weekly wages for federal government employees for 
a specific average weekly wage for state government 
employees is about $20.09. 


17. (a) 0.790; About 79.0% of the variation in the gross 


collections of corporate income taxes can be explained 


by the variation in the gross collections of individual 
income taxes, and about 21.0% of the variation is 
unexplained. 

(b) 42.386; The standard error of estimate of the gross 
collections of corporate income taxes for a specific 


gross collection of individual income taxes is about 
$42,386,000,000. 
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19. 


21. 


23. 


25. 


27. 


29. 


31. 


33. 


35. 
37. 


39. 


40,116.824 < y < 82,624.318 


You can be 95% confident that the proceeds will be 
between $40,116,824,000 and $82,624,318,000 when the 
number of initial offerings is 450 issues. 


1218.435 < y < 1336.829 


You can be 90% confident that the shopping center sales 
will be between $1,218,435,000 and $1,336,829,000 when 
the total square footage of shopping centers is 
5,750,000,000. 


1007.82 < y < 1208.228 


You can be 99% confident that the average weekly wages 
of federal government employees will be between $1007.82 
and $1208.23 when the average weekly wages of state 
government employees is $800. 

213.729 < y < 450.519 

You can be 95% confident that the corporate income 

taxes collected by the U.S. Internal Revenue Service for 

a given year will be between $213,729,000,000 and 
$450,519,000,000 when the U.S. Internal Revenue Service 
collects $1,250,000,000 in individual income taxes that year. 


; 
4 
5 
a 

Xi | Si dv Via | Sia Siew y. 

9.4 | 7.6 | 7.1252 0.4372 0.4748 0.912 

9.2 | 6.9 | 7.0736 0.3856 | —0.1736 0.212 

8.9 | 6.6 | 6.9962 0.3082 | —0.3962 | —0.088 

8.4 | 6.8 | 6.8672 0.1792 | —0.0672 0.112 

8.3 | 6.9 | 6.8414 0.1534 0.0586 0.212 

6.5 | 6.5 | 6.377 —0.311 0.123 —0.188 

6 6.3 | 6.248 —0.44 0.052 —0.388 

49 | 5.9 | 5.9642 | —0.7238 | —0.0642 | —0.788 
0.746; About 74.6% of the variation in the median ages 


of trucks in use can be explained by the variation in the 
median ages of cars in use, and about 25.4% of the 
variation is unexplained. 


5.792 < y < 7.22 

You can be 95% confident that the median age of trucks in 
use will be between 5.792 and 7.22 years when the median 
age of cars in use is 7.0 years. 

(a) 0.671 (b) 1.780 (c) 9.537 < y < 19.010 

Fail to reject Hp. There is not enough evidence at the 1% 
level of significance to support the claim that there is a 
linear relationship between weight and number of hours 
slept. 

—118.927 < B < 323.505 


110.911 < M < 281.393 
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Section 9.4 (page 527) 


1. 


(a) 39,103.5 pounds per acre 
(b) 39,939.1 pounds per acre 
(c) 38,063.5 pounds per acre 
(d) 39,052.4 pounds per acre 


. (a) 7.5 cubic feet  (b) 16.8 cubic feet 


(c) 51.9 cubic feet (d) 62.1 cubic feet 


. J = —2518.364 + 126.822x, + 66.360x> 


(a) 28.489; The standard error of estimate of the predicted 
sales given specific total square footage and number of 


shopping centers is about $28.489 billion. 


(b) 0.985; The multiple regression model explains about 
98.5% of the variation in y. 


. y = —2518.364 + 126.822x, + 66.360x; The equation is 


the same. 


. 0.981; About 98.1% of the variation in y can be explained 


by the relationship between variables; r3dj a 


Uses and Abuses for Chapter 9 (page 529) 


1. 


Answers will vary. 2. Answers will vary. 


Review Answers for Chapter 9 (page 531) 


1. 


A 0.912; strong positive linear 
correlation; the number of 


number of pass attempts 
increases. 


Passing yards 


475 500 525 550 575 600 
Pass attempts 


‘ 0.338; weak positive linear 
ZS 1000+ * correlation; brain size 
S . . 
3 9507 increases as IQ increases. 
o 
SS 900+ e 
eS e° 
2 850+ e 
55 
5 800- ee? 
3B 750-4 
* 
S Ages 


. There is not enough evidence at the 1% level of 


significance to conclude that there is a significant linear 
correlation. 


. There is enough evidence at the 5% level of significance 


to conclude that there is a significant linear correlation 
between a quarterback’s pass attempts and passing yards. 


. There is not enough evidence at the 1% level of 


significance to conclude that there is a significant linear 
correlation between IQ and brain size. 


passing yards increases as the 


11. 


13. 


15. 


17. 


19. 


21. 


23. 


25. 


y = 0.038x — 3.529 
4 r = 0.821 


i) ee) es 
| Mt | 
t t t 

ke 
e 
e 


gallon (in dollars) 
fe 


Average price per 


=——— ee 
165 170 175 180 185 190 
Amount of milk 
(in billions of pounds) 


y = —0.086x + 10.450 


Y r= —0.949 


Hours of sleep 


20 30 40 50 60 70 80 
Age (in years) 


(a) It is not meaningful to predict the value of y for 
x = 160 because x = 160 is outside the range of the 
original data. 

(b) $3.12 (c) $3.31 

(d) It is not meaningful to predict the value of y for 
x = 200 because x = 200 is outside the range of the 
original data. 

(a) It is not meaningful to predict the value of y for 
x = 18 because x = 18 is outside the range of the 
original data. 

(b) 8.3 hours 

(c) It is not meaningful to predict the value of y for 
x = 85 because x = 85 is outside the range of the 
original data. 

(d) 6.15 hours 

0.203; About 20.3% of the variation is explained. About 

79.7% of the variation is unexplained. 

0.412; About 41.2% of the variation is explained. About 

58.8% of the variation is unexplained. 

(a) 0.679; About 67.9% of the variation in the fuel 
efficiency of the compact sports sedans can be 
explained by the variation in their prices, and about 
32.1% of the variation is unexplained. 

(b) 1.138; The standard error of estimate of the fuel 
efficiency of the compact sports sedans for a specific 
price of the compact sports sedans is about 1.138 miles 
per gallon. 

2.997 < y < 4.025 

You can be 90% confident that the price per gallon of milk 

will be between $3.00 and $4.03 when 185 billion pounds of 

milk is produced. 
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27. 4.865 < y < 8.295 


You can be 95% confident that the hours slept will be 
between 4.865 and 8.295 hours for a person who is 
45 years old. 

29, 16.119 < y < 25.137 


You can be 99% confident that the fuel efficiency of the 
compact sports sedan that costs $39,900 will be between 
16.119 and 25.137 miles per gallon. 


31. § = 3.674 + 1.287x, — 7.531x 


33. (a) 21.705 (b) 25.21 (c) 30.1 (d) 25.86 


Chapter Quiz for Chapter 9 (page 535) 


1. = 7‘ 
B 
aSXxs 
2253 soot a 
BO Sue e 
S256 
Sen 45.07 ¢ 
sagu e 
Ssa665 e 
oS 400+ 
Man Ss 
Bund e 
SLES 35.04 
< R= 
wm x 
60.0 70.0 80.0 90.0 100.0 
Average annual salary 
for public school principals 


(in thousands of dollars) 


The data appear to have a positive linear correlation. 
As x increases, y tends to increase. 


N 


0.993; strong positive linear correlation; public school 
classroom teachers’ salaries increase as public school 
principals’ salaries increase. 


- 


Reject Hp. There is enough evidence at the 5% level of 
significance to conclude that there is a significant linear 
correlation between public school principals’ salaries and 
public school classroom teachers’ salaries. 


4. 9 = 0.491x + 5.977 


eC 
Boxes 
2255 5007 
B43 45.0 
e2eg > 
saoog 
BBS % 400+ 
Su Bs 
gseas 35.0-+ 
< & 

Nat {pe 


t t t t t 
60.0 70.0 80.0 90.0 100.0 
Average annual salary 
for public school principals 
(in thousands of dollars) 


v! 


$50,412.50 


6. 0.986; About 98.6% of the variation in the average annual 
salaries of public school classroom teachers can be 
explained by the variation in the average annual salaries 
of public school principals, and about 1.4% of the variation 
is unexplained. 


7. 0.490; The standard error of estimate of the average 
annual salary of public school classroom teachers for 
a specific average annual salary of public school principals 
is about $490. 
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8. 46.887 < y < 49.273 


You can be 95% confident that the average annual salary of 
public school classroom teachers will be between $46,887 
and $49,273 when the average annual salary of public school 
principals is $85,750. 


9. (a) $59.30 
(b) $30.53 
(c) $45.67 
(d) $35.83 


Real Statistics—Real Decisions for Chapter 9 
1. (a) y 


(page 536) 


Nitrogen oxides emissions 
(in millions of tons) 
- 
| 


x 
7 8 9 1011 1213 14 


Sulfur dioxide emissions 
(in millions of tons) 

It appears that there is a positive linear correlation. 

As the sulfur dioxide emissions increase, the nitrogen 

oxides emissions increase. 
(b) r © 0.947; There is a strong positive linear correlation. 
(c) There is enough evidence at the 5% level of 

significance to conclude that there is a significant linear 


correlation between sulfur dioxide emissions and 
nitrogen oxides emissions. 


(d) § = 0.652x — 2.438 


y Yes, the line appears to be 
7+ a good fit. 


x 


Nitrogen oxides emissions 
(in millions of tons) 
- 
| 
® 


7 8 9 1011 1213 14 


Sulfur dioxide emissions 
(in millions of tons) 


(e) Yes, for x-values that are within the range of the data 
set. 


(f) r* © 0.898; About 89.8% of the variation in nitrogen 
oxides emissions can be explained by the variation in 
sulfur dioxide emissions, and about 10.2% of the 
variation is unexplained. 


S. © 0.368; The standard error of estimate of nitrogen 
oxides emissions for a specific sulfur dioxide emission is 
about 368,000 tons. 
2. 1.358 < y < 3.286 

You can be 95% confident that the nitrogen oxide 

emissions will be between 1.358 and 3.286 million tons when 

the sulfur dioxide emissions are 17.3 — 10 = 7.3 

million tons. 
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CHAPTER 10 


Section 10.1 (page 546) 


1. 


11. 


13. 


A multinomial experiment is a probability experiment 
consisting of a fixed number of independent trials in which 
there are more than two possible outcomes for each trial. 
The probability of each outcome is fixed, and each outcome 
is classified into categories. 


» 45 5.5759 
. (a) Ho: The distribution of the ages of moviegoers is 26.7% 


ages 2-17, 19.8% ages 18-24, 19.7% ages 25-39, 14% 
ages 40-49, and 19.8% ages 50+. (claim) 
H,: The distribution of ages differs from the claimed 
or expected distribution. 

(b) v4 = 7.779; Rejection region: y* > 7.779 

(c) 7.256 

(d) Fail to reject Hp. 

(e) There is enough evidence at the 10% level of 
significance to conclude that the distribution of the 


ages of moviegoers and the claimed or expected 
distribution are the same. 


. (a) Ho: The distribution of the days people order food for 


delivery is 7% Sunday, 4% Monday, 6% Tuesday, 

13% Wednesday, 10% Thursday, 36% Friday, and 

24% Saturday. 

H,: The distribution of days differs from the claimed or 
expected distribution. (claim) 

(b) yo = 16.812; 

Rejection region: y* > 16.812 

(c) 17.595 

(d) Reject Ho. 

(e) There is enough evidence at the 1% level of 
significance to conclude that there has been a change in 
the claimed or expected distribution. 

(a) Ho: The distribution of the number of homicide crimes 
in California by season is uniform. (claim) 

H,: The distribution of homicides by season is not 
uniform. 

(b) v6 = 7-815; Rejection region: y* > 7.815 

(c) 0.727 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the claim that distribution of the 
number of homicide crimes in California by season is 
uniform. 

(a) Ho: The distribution of the opinions of U.S. parents 
on whether a college education is worth the expense is 
55% strongly agree, 30% somewhat agree, 5% neither 
agree nor disagree, 6% somewhat disagree, and 
4% strongly disagree. 

H,: The distribution of opinions differs from the 
claimed or expected distribution. (claim) 


b = 9.488; Rejection region: y” > 9.488 
(b) Xo j gion: x 


15. 


17. 


19. 


Section 10.2 
1. 


(c) 65.236 

(d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to conclude that the distribution of the 
opinions of U.S. parents on whether a college education 
is worth the expense differs from the claimed or 
expected distribution. 


(a) Ho: The distribution of prospective home buyers by the 
size they want their next house to be is uniform. (claim) 


H,: The distribution of prospective home buyers by the 
size they want their next house to be is not uniform. 


(b) v% = 5.991; Rejection region: y* > 5.991  (c) 10.308 

(d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to reject the claim that the distribution of 


prospective home buyers by the size they want their 
next house to be is uniform. 


Chi-Square goodness-of-fit results: 
Observed: Recent survey 


Expected: Previous survey 


N DF 
400 9 


Chi-Square 
18.637629 


P-Value 
0.0285 


P = 0.0285, so reject Hy. There is enough evidence at the 

10% level of significance to conclude that there has been 

a change in the claimed or expected distribution of 

US. adults’ favorite sports. 

(a) The expected frequencies are 17, 63, 79, 34, and 5. 

(b) v% = 13.277; Rejection region: x” > 13.277 

(c) 0.613 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the claim that the test scores are 
normally distributed. 


(page 557) 


Find the sum of the row and the sum of the column in 
which the cell is located. Find the product of these sums. 
Divide the product by the sample size. 


. Answers will vary. Sample answer: For both the chi-square 


test for independence and the chi-square goodness-of-fit 
test, you are testing a claim about data that are in 
categories. However, the chi-square goodness-of-fit test has 
only one data value per category, while the chi-square test 
for independence has multiple data values per category. 


Both tests compare observed and expected frequencies. 
However, the chi-square goodness-of-fit test simply 
compares the distributions, whereas the chi-square test for 
independence compares them and then draws a conclusion 
about the dependence or independence of the variables. 


. False. If the two variables of a chi-square test for 


independence are dependent, then you can expect a large 
difference between the observed frequencies and the 
expected frequencies. 
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Athlete has 
Result Stretched Not stretched | Total 
Injury 18 (20.82) 22 (19.18) 40 
No injury | 211 (208.18) 189 (191.82) 400 
Total 229 211 440 
Peta) Preference 
Bank New Old No 
employee | procedure | procedure | preference | Total 
92 351 
Tell 
eller (133.80) (313.00) 50 (46.19) 493 
Customer 76 42 
service (34.20) (80.00) aren oe 
Total 168 393 58 
11. —(b 
(a-(b) Type of car 
Full- Truck/ 
Gender | Compact size SUV van Total 
Male 28 39 21 22 110 
(28.6) (39.05) | (22.55) | (19.8) 
Female 24 32 20 14 90 
(23.4) (31.95) | (18.45) | (16.2) 
Total 52 71 41 36 200 


13. (a) Hp: Skill level in a subject is independent of location. 


(claim) 


H,: Skill level in a subject is dependent on location. 
(b) df. = 2; yj = 9.210; Rejection region: y* > 9.210 


(c) 0.297 


(d) Fail to reject Hp. There is not enough evidence at the 
1% level of significance to reject the claim that skill 
level in a subject is independent of location. 


15. (a) Hp: The number of times former smokers tried to quit 
is independent of gender. 


H,: The number of times former smokers tried to quit 
is dependent on gender. (claim) 


(b) df. = 2; yj = 5.991; Rejection region: y* > 5.991 


(c) 0.002 


(d) Fail to reject Hy. There is not enough evidence at the 
5% level of significance to conclude that the number 
of times former smokers tried to quit is dependent 
on gender. 


17. (a) Hp: Results are independent of the type of treatment. 


H,: Results are dependent on the type of treatment. 


(claim) 


(b) df. = 1; y§ = 2.706; Rejection region: y* > 2.706 


(c) 5.106 


(d) Reject Hp. There is enough evidence at the 10% level 
of significance to conclude that results are dependent 
on the type of treatment. Answers will vary. 
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19. (a) Hp: Reasons are independent of the type of worker. 
H,: Reasons are dependent on the type of worker. (claim) 
(b) df. = 2; x§ = 9.210; Rejection region: y* > 9.210 
(c) 7.326 


(d) Fail to reject Hp. There is not enough evidence at the 1% 
level of significance to conclude that reasons for continuing 
education are dependent on the type of worker. On the 
basis of these results, marketing strategies should not 
differ between technical and nontechnical audiences in 
regard to reasons for continuing education. 


21. (a) Hy: Type of crash is independent of the type of vehicle. 


H,: Type of crash is dependent on the type of vehicle. 
(claim) 


(b) df. = 2; x6 = 5.991; Rejection region: y* > 5.991 
(c) 144.099 


(d) Reject Hp. There is enough evidence at the 5% level 
of significance to conclude that the type of crash is 
dependent on the type of vehicle. 


23. (a)-(b) Contingency table results: 
Rows: Expected income 


Columns: None 


Cell format 
Count 
Expected count 
More Less Did not make 
likely | likely a difference 
Less than 37 10 22 
$35,000 33.33 | 6.836 27.99 
$35,000 to 28 12 15 
$50,000 25.17 | 5.164 21.14 
$50,000 to 55 9 65 
$100,000 62.75 | 12.87 52:7 
Greater than 36 1 29 
$100,000 34.75 | 7.127 29.18 
Total 156 32 131 
Dee not lh Total 
consider it 
Less than 25 94 
$35,000 25.85 
$35,000 to 16 71 
$50,000 19.52 
$50,000 to 48 177 
$100,000 48.68 
Greater than 32 98 
$100,000 26.95 
Total 121 440 
Statistic DF Value P-value 
Chi-square 9 | 26.22966 0.0019 
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(c) Reject Ho. There is enough evidence at the 1% level 


of significance to conclude that the decision to borrow 


money is dependent on the child’s expected income 
after graduation. 


25. Fail to reject Hy. There is not enough evidence at the 5% 


level of significance to reject the claim that the proportions 


of motor vehicle crash deaths involving males or females 
are the same for each age group. 


27. Right-tailed 


ae Educational attainment 
Nota Some | Associate’s, 
high High college, | bachelor’s, 
school school no or advanced 
Status graduate | graduate | degree degree 
Employed 0.055 0.183 0.114 0.290 
Unemployed 0.006 0.011 0.005 0.007 
Not in the 0.073 0.118 0.053 0.085 
labor force 


31. Several of the expected frequencies are less than 5. 
33. 45.2% 


a: Educational attainment 
Nota Some | Associate’s, 
high High college, | bachelor’s, 
school school no or advanced 
Status graduate | graduate | degree degree 
Employed 0.411 0.587 0.660 0.759 
Unemployed 0.046 0.036 0.030 0.019 
Not in the 0.544 0.377 0.311 0.223 
labor force 


37. 4.6% 


39, Answers will vary. Sample answer: As educational 
attainment increases, employment increases. 


Section 10.3 (page 571) 


1. Specify the level of significance a. Determine the degrees 


of freedom for the numerator and denominator. Use 
Table 7 in Appendix B to find the critical value F. 


3. (1) The samples must be randomly selected, (2) the 
samples must be independent, and (3) each population 
must have a normal distribution. 


5. 2.54 7. 2.06 9. 9.16 11. 1.80 


13. Fail to reject Hy. There is not enough evidence at the 10% 


level of significance to support the claim. 


15. Fail to reject Hy. There is not enough evidence at the 1% 
level of significance to reject the claim. 


17. Reject Ho. There is enough evidence at the 1% level of 
significance to reject the claim. 


19. 


21. 


23. 


25. 


27. 


(a) Ho: of = 03; Hy: 07 > 0% (claim) 

(b) Fo = 2.11; Rejection region: F > 2.11 

(c) 1.08 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to support Company A’s claim that the 


variance of the life of its appliances is less than the 
variance of the life of Company B’s appliances. 


(a) Ho: of = 03; Hy: of # 0% (claim) 

(b) Fy = 6.23; Rejection region: F > 6.23 
(c) 2.10 

(d) Fail to reject Ho. 


(e) There is not enough evidence at the 5% level of 
significance to conclude that the variances of the prices 
differ between the two companies. 


(a) Ho: of = 03 (claim); H,: of # 03 

(b) Fo = 2.635; Rejection region: F > 2.635 

(c) 1.282 

(d) Fail to reject Hp. 

(ce) There is not enough evidence at the 10% level of 
significance to reject the administrator’s claim that the 
standard deviations of science assessment test scores 


for eighth grade students are the same in Districts 1 
and 2. 


(a) Hp: of = 03; H,: 07 > 0% (claim) 

(b) Fo = 2.35; Rejection region: F > 2.35 

(c) 2.41 

(d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to conclude that the standard deviation of 


the annual salaries for actuaries is greater in New York 
than in California. 
Hypothesis test results: 


o?: variance of population 1 


o%: variance of population 2 
o?/o3: variance ratio 
Ho: oj/o% = 1 


Ha: o7/o0% #1 


Sample 
Ratio nl | n2 Ratio F-Stat P-value 
a7/o% | 15 | 18 | 0.5281571 | 0.5281571 | 0.2333 


P = 0.2333 > 0.10, so fail to reject Hp. There is not 
enough evidence at the 10% level of significance to reject 
the claim. 
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29. Hypothesis test results: 


a7: variance of population 1 


o%: variance of population 2 
o?/o%: variance ratio 
Ho: «7/03 = 1 


Ha: o7/03 > 1 


Sample 
Ratio nl | n2 Ratio F-Stat P-value 
of/oF | 22 | 29 | 2.153926 | 2.153926 | 0.0293 


P = 0.0293 < 0.05, so reject Hp. There is enough evidence 
at the 5% level of significance to reject the claim. 


Right-tailed: 14.73 33. (0.375, 3.774) 
Left-tailed: 0.15 


31 


Section 10.4 (page 581) 


1. Ao: m1 = Bo = 3 tee Mk 
H,: At least one of the means is different from the others. 


3. The MS; measures the differences related to the treatment 
given to each sample. The MSyw measures the differences 
related to entries within the same sample. 

5. (a) Ho: by = Ho = B3 

H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 3.37; 
Rejection region: F > 3.37 

(c) 1.02 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to conclude that the mean costs per ounce 
are different. 

7. (a) Ho: by = Ho = Be 

H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 5.49; Rejection region: F > 5.49 

(c) 21.99 

(d) Reject Hp. 

(e) There is enough evidence at the 1% level of 
significance to conclude that at least one mean salary 
is different. 

9. (a) Ao: ba = bo = bs = ba = Bs 

H,: At least one mean is different from the others. 
(claim) 

(b) Fy = 4.37; Rejection region: F > 4.37 

(c) 12.61 

(d) Reject Hp. 

(e) There is enough evidence at the 1% level of 


significance to conclude that at least one mean cost 
per mile is different. 
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11. 


13. 


15. 


17. 


(a) Ao: Mi = M2 = M3 = Mg (claim) 

H,: At least one mean is different from the others. 

(b) Fo = 4.54; Rejection region: F > 4.54 

(c) 0.56 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance for the company to reject the claim that 
the mean number of days patients spend at the hospital 
is the same for all four regions. 

(a) Ho: wa = Ho = Ba = Ma 
H,: At least one mean is different from the others. 
(claim) 

(b) Fo = 2.255; Rejection region: F > 2.255 

(c) 3.107 

(d) Reject Hp. 

(e) There is enough evidence at the 10% level of 
significance to conclude that the mean energy 


consumption of at least one region is different from 
the others. 


Analysis of Variance results: 
Data stored in separate columns. 


Column means 


Column n | Mean | Std. Error 
Grade 9 8 | 84.375 9.531784 
Grade 10 8 79.25 9.090321 
Grade 11 8 | 76.625 7.648383 
Grade 12 8 70.75 6.9224014 
ANOVA table 

Source df SS MS 
Treatments 3 771.25 | 257.08334 
Error 28 15674.75 559.8125 
Total 31 16446 

F-Stat P-Value 
0.45923114 0.7129 


P = 0.7129 > 0.01, so fail to reject Hp. There is not 
enough evidence at the 1% level of significance to reject 
the claim that the mean numbers of female students who 
played on a sports team are equal for all grades. 


Fail to reject all null hypotheses. The interaction between 
the advertising medium and the length of the ad has no 
effect on the rating and therefore there is no significant 
difference in the means of the ratings. 
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19. Fail to reject all null hypotheses. The interaction between 7. (a) Ey, © 54.86, Ey. © 40.38, Ey; © 22.10, E, 4 = 16.00, 
age and gender has no effect on GPA and therefore there E\,5 © 26.67, Ey, © 17.14, Ey. © 12.62, E, 3 ~ 6.90, 
is no significant difference in the means of the GPAs. Ey, 4 = 5.00, Ey,5 © 8.33 

21. CVgchetre = 10.98 (b) Reject Hp. 
(1, 2) > 22.396 — Significant difference (c) There is enough evidence at the 1% level of significance 
(1, 3) > 40.837 — Significant difference to conclude that a species’ status (endangered or 
(2,3) > 2.749 — No difference threatened) is dependent on vertebrate group. 

6 a = - : 92.295 11.239 13. 2.06 15. 2.08 

; (1 Se Gz eee eee 17. Fail to reject A. There is not enough evidence at the 
? j 1% level of significance to reject the claim. 

ee eames ene 19. Fail to reject Hy. There is not enough evidence at the 10% 
(1, 4) > 1.345 — No difference level of significance to support the claim that the variation 
(2,3) — 2.374 — No difference in wheat production is greater in Garfield County than in 
(2, 4) > 1.122 > No difference Kay County. 
(3, 4) > 7.499 — Significant difference 21. Fail to reject Hp. There is not enough evidence at the 1% 


level of significance to support the claim that the test score 
variance for females is different from that for males. 
23. Reject Hp. There is enough evidence at the 10% level of 


1-2. Answers will vary. significance to conclude that at least one of the mean costs 
is different from the others. 


Uses and Abuses for Chapter 10 (page 587) 


Review Answers for Chapter 10 = (page 589) 


1. (a) Hp: The distribution of the allowance amounts is 29% Chapter: Quiz for Chapter 10) (page 072) 
less than $10, 16% $10 to $20, 9% more than $21, and 1. (a) Ho: 0} = 03; Hy of # 03 (claim) 
46% don’t give one/other. (b) 0.01 (c) = 3.80 
(d) Rejection region: F > 3.80 
(e) 2.12  (f) Fail to reject Hp. 
(g) There is not enough evidence at the 1% level of 
significance to conclude that the variances in annual 


H,: The distribution of amounts differs from the 
claimed or expected distribution. (claim) 


(b) x6 = 6.251; Rejection region: y* > 6.251 
(c) 4.886  (d) Fail to reject Hp. 


(e) There is not enough evidence at the 10% level of wages for San Francisco, CA and Baltimore, MD are 
significance to conclude that there has been a change different. 
in the claimed or expected distribution. 2. (a) Ho: wy = po = py (claim) 


3. (a) Ho: The distribution of responses from golf students 
about what they need the most help with is 22% 
approach and swing, 9% driver shots, 4% putting, 


H,: At least one mean is different from the others. 
(b) 0.10 (c) & = 2.44 


and 65% short-game shots. (claim) (d) Rejection region: F > 2.44 

H,: The distribution of responses differs from the (e) 27.48  (f) Reject Hp. 

claimed or expected distribution. (g) There is enough evidence at the 10% level of 
(b) x6 = 7.815; Rejection region: x? > 7.815 significance to reject the claim that the mean annual 
(c) 0.503 (d) Fail to reject Ho. wages are equal for all three cities. 
(e) There is enough evidence at the 5% level of 3. (a) Ho: The distribution of educational achievement for 


people in the United States ages 35-44 is 13.4% nota 
high school graduate, 31.2% high school graduate, 
17.2% some college, no degree; 8.8% associate’s degree, 
19.1% bachelor’s degree, and 10.3% advanced degree. 


significance to conclude that the distribution of golf 
students’ responses is the same as the claimed or 
expected distribution. 
5. (a) Fy4 = 63, Ey. = 356.4, Ey,3 = 319.8, E\,4 = 310.8, : : i : : . 
Ep 1 = 147, Exo = 831.6, Ey 5 = 746.2, Ey 4 = 7252 Hy: The distribution of educational achievement for 
. , , ; people in the United States ages 35-44 differs from the 
(b) Reject Hp. 


claimed distribution. (claim) 
(c) There is enough evidence at the 1% level of (b) 0.05 (c) x2 = 11.071 


significance to conclude that public school teachers’ oo, ‘ae 
gender and years of full-time teaching experience (d) Rejection region: x" > 11.071 
are related. (e) 3.799 (f) Fail to reject Ah. 


(g) There is not enough evidence at the 5% level of 
significance to conclude that the distribution for people 
in the United States ages 35-44 differs from the 
distribution for people ages 25 and older. 
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4. (a) Hy: The distribution of educational achievement for 
people in the United States ages 65-74 is 13.4% not 
a high school graduate, 31.2% high school graduate, 
17.2% some college, no degree; 8.8% associate’s 
degree, 19.1% bachelor’s degree, and 10.3% advanced 
degree. 


H,: The distribution of educational achievement for 
people in the United States ages 65-74 differs from 
the claimed distribution. (claim) 


(b) 0.01 

(c) xj = 15.086 

(d) Rejection region: y* > 15.086 

(e) 26.175 

(f) Reject Ap. 

(g) There is enough evidence at the 1% level of 
significance to conclude that the distribution for 


people in the United States ages 65-74 differs from 
the distribution for people ages 25 and older. 


Real Statistics-Real Decisions for Chapter 10 (page 594) 


1. Reject Ho. There is enough evidence at the 1% level of 
significance to conclude that the distribution of responses 
differs from the claimed or expected distribution. 
=F 15; Ey. = 120, E,3 = 165, E\.4 = 185, 
E,.5 = 135, Ey, 6 = 115, Ey,7 = 155, Ey, 3 = 110, 
Ey, = 15, Ex. = 120, E,3 = 165, Ey 4 = 185, 
E.5 = 135, Ey 6 = 115, Ey,7 = 155, Ey 3 = 110 

(b) There is enough evidence at the 1% level of 
significance to conclude that the ages of the victims 
are related to the type of fraud. 


CHAPTER 11 
Section 11.1 (page 604) 


1. A nonparametric test is a hypothesis test that does not 
require any specific conditions concerning the shapes of 
populations or the values of population parameters. 


A nonparametric test is usually easier to perform than its 
corresponding parametric test, but the nonparametric test 
is usually less efficient. 


3. When 7 is less than or equal to 25, the test statistic is equal 
to x (the smaller number of + or — signs). 
When n is greater than 25, the test statistic is equal to 
_ (x + 0.5) — 0.5n 


Va 
2 
5. Identify the claim and state Hp and H,. Identify the level 
of significance and sample size. Find the critical value using 
Table 8 (if m = 25) or Table 4 (n > 25). Calculate the test 
statistic. Make a decision and interpret it in the context of 
the problem. 
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11. 


13. 


15. 


17. 


19. 


21. 


. (a) Ho: median < $300; H,: median > $300 (claim) 


(b)1 (c) 5 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance for the accountant to conclude that the 
median amount of new credit card charges for the 
previous month was more than $300. 


. (a) Hp: median = $198,000 (claim) 


H,: median > $198,000 

(b)1 (c) 4  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the agent’s claim that the median 
sales price of new privately owned one-family homes 
sold in the past year is $198,000 or less. 

(a) Hy: median = $3000 (claim); H,: median < $3000 

(b) —2.05  (c) —1.47 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 2% level of 
significance to reject the institution’s claim that the 
median amount of credit card debt for families holding 
such debts is at least $3000. 

(a) Hp: median =< 30; H,: median > 30 (claim) 

(b) 4 (c) 10. (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the research group’s claim that 
the median age of Twitter® users is greater than 
30 years old. 

(a) Hp: median = 4 (claim); H,: median # 4 

(b) -1.96 (c) —1.90  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the organization’s claim that the 
median number of rooms in renter-occupied units is 4. 

(a) Hy: median = $37.06 (claim); H,: median 4 $37.06 

(b) —2.575  (c) —0.91  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the labor organization’s claim that 
the median hourly wage of computer systems analysts 
is $37.06. 


(a) Hp: The lower back pain intensity scores have not 
decreased. 

H,: The lower back pain intensity scores have 
decreased. (claim) 

(b)1 (c)O (d) Reject Hp. 

(e) There is enough evidence at the 5% level of 
significance to conclude that the lower back pain 
intensity scores were lower after the acupuncture. 

(a) Hy: The SAT scores have not improved. 

H,: The SAT scores have improved. (claim) 

(b)2 (c) 4  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 


significance to conclude that the critical reading SAT 
scores improved. 
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23. 


25. 


27. 


29. 
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(a) Reject Hp. 

(b) There is enough evidence at the 5% level of 
significance to reject the claim that the proportion of 
adults who feel older than their real age is equal to the 
proportion of adults who feel younger than their real 
age. 

Hypothesis test results: 

Parameter : median of Variable 

Ho: Parameter = 22.55 

Hy, : Parameter # 22.55 


Variable n | n for test 

Hourly wages (in dollars) | 14 13 
Sample Median | Below | Equal | Above | P-value 
26.075 2 1 11 0.0225 


P = 0.0225 < 0.05, so reject Hp. There is enough evidence 
at the 5% level of significance to reject the labor 
organization’s claim that the median hourly wage of 

tool and die makers is $22.55. 


(a) Ho: median < $638 (claim); H,: median > $638 

(b) 2.33 (c) 1.46 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the organization’s claim that the 
median weekly earnings of female workers is less than 
or equal to $638. 

(a) Ho: median = 26 (claim); H,: median > 26 

(b) 1.645 (c) 1.302 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 
significance to reject the counselor’s claim that the 


median age of brides at the time of their first marriage 
is less than or equal to 26 years. 


Section 11.2 (page 615) 


1. 


If the samples are dependent, use a Wilcoxon signed-rank 
test. If the samples are independent, use a Wilcoxon rank 
sum test. 


. (a) Ho: There is no reduction in diastolic blood pressure. 


(claim) 

H,; There is a reduction in diastolic blood pressure. 
(b) Wilcoxon signed-rank test 
(c) 10 (d) 17 (e) Fail to reject Hp. 
(f) There is not enough evidence at the 1% level of 


significance to reject the claim that there was no 
reduction in diastolic blood pressure. 


. (a) Ho: The cost of prescription drugs is not lower in 


Canada than in the United States. 


H,: The cost of prescription drugs is lower in Canada 
than in the United States. (claim) 


(b) Wilcoxon signed-rank test 
(c)4 (d) 6 (e) Fail to reject Hp. 


. (a) Fail to reject Hp. 


(f) There is not enough evidence at the 5% level of 
significance for the researcher to conclude that the cost 
of prescription drugs is lower in Canada than in the 
United States. 


7. (a) Ho: There is no difference in salaries. 


H,: There is a difference in salaries. (claim) 

(b) Wilcoxon rank sum test 

(c) £1.96 (d) —1.94  (e) Fail to reject Hp. 

(f) There is not enough evidence at the 5% level of 
significance to support the representative’s claim that 


there is a difference in the salaries earned by teachers 
in Wisconsin and Michigan. 


9. Reject Hp. There is enough evidence at the 10% level of 


significance for the engineer to conclude that the gas 
mileage is improved. 


Section 11.3 (page 623) 


1. The conditions for using a Kruskal-Wallis test are that each 


sample must be randomly selected and the size of each 
sample must be at least 5. 


. (a) Ho: There is no difference in the premiums. 


H,: There is a difference in the premiums. (claim) 
(b) 5.991  (c) 9.506 (d) Reject Hp. 
(e) There is enough evidence at the 5% level of 


significance to conclude that the distributions of the 
annual premiums of the three states are different. 


. (a) Ho: There is no difference in the salaries. 


H,: There is a difference in the salaries. (claim) 
(b) 6.251 (c) 1.202 (d) Fail to reject Hp. 
(e) There is not enough evidence at the 10% level of 


significance to conclude that the distributions of the 
annual salaries in the four states are different. 


. Kruskal-Wallis results: 


Data stored in separate columns. 

Chi Square = 8.0965185 (adjusted for ties) 
DF = 2 

P-value = 0.0175 


Column | n | Median | Ave. Rank 
A 6 5 6.75 
B 6 8.5 14.5 
Cc 6 5 7.25 


P = 0.0175 > 0.01, so fail to reject Hp. There is not 
enough evidence at the 1% level of significance to 
conclude that the distributions of the number of job offers 
at Colleges A, B, and C are different. 

(b) Fail to reject Hp. 

Both tests come to the same decision, which is that there is 
not enough evidence to support the claim that there is a 
difference in the number of days spent in the hospital. 
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Section 11.4 (page 628) 


1. 


11. 


13. 


The Spearman rank correlation coefficient can be used to 
describe the relationship between linear or nonlinear data. 
Also, it can be used for data at the ordinal level and it is 
easier to calculate by hand than the Pearson correlation 
coefficient. 


. The ranks of the corresponding data are identical when r, 


is equal to 1. The ranks are in “reverse” order when /, is 
equal to —1. The ranks have no relationship when , is 
equal to 0. 


. (a) Hp: p, = 0; H,: p, # 0 (claim) 


(b) 0.929 

(c) 0.857 

(d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to support the claim that there is a 


correlation between debt and income in the farming 
business. 


- (a) Ap: ps = 0; Hy: ps # 0 (claim) 


(b) 0.833 

(c) 0.950 

(d) Reject Hp. 

(e) There is enough evidence at the 1% level of 


significance to conclude that there is a correlation 
between the oat and wheat prices. 


. Fail to reject Ho. There is not enough evidence at the 5% 


level of significance to conclude that there is a correlation 
between science achievement scores and GNI. 


Reject Hp. There is enough evidence at the 5% level of 
significance to conclude that there is a correlation between 
science and mathematics achievement scores. 


Fail to reject Hp. There is not enough evidence at the 5% 
level of significance to conclude that there is a correlation 
between average hours worked and the number of 
on-the-job injuries. 


Section 11.5 (page 637) 


1. 


11. 
13. 


Answers will vary. Sample answer: It is called the runs test 
because it considers the number of runs of data in a sample 
to determine whether the sequence of data was randomly 
selected. 


. Number of runs: 8 


Run lengths: 1, 1,1, 1,3,3,1,1 


- Number of runs: 9 


Run lengths: 1, 1, 1,1, 1,6, 3,2, 4 
n, = number of T’s = 6 


number of F’s = 6 


nz 


. mM, = number of M’s = 10 


10 


number of F’s 


nz 
too high: 11; too low: 3 
too high: 14; too low: 5 
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15. 


17. 


19. 


21. 


23. 


(a) Ho: The coin tosses were random. 
H,: The coin tosses were not random. (claim) 
(b) lower critical value = 4 
upper critical value = 14 
(c) 9  (d) Fail to reject Hp. 
(e) There is not enough evidence at the 5% level of 


significance to support the claim that the coin tosses 
were not random. 


(a) Ho: The sequence of leagues of winning teams is 
random. 
H,: The sequence of leagues of winning teams is not 
random. (claim) 

(b) 1.96 (c) 1.79  (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 


significance to conclude that the sequence of leagues of 
World Series winning teams is not random. 


(a) Ho: The microchips are random by gender. (claim) 
H,: The microchips are not random by gender. 

(b) lower critical value = 8 
upper critical value = 18 

(c) 12 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% significance 


level to reject the claim that the microchips are 
random by gender. 


Fail to reject Ho. There is not enough evidence at the 5% 
level of significance to support the claim that the daily high 
temperatures do not occur randomly. 


Answers will vary. 


Uses and Abuses for Chapter 11 (page 639) 


1. 
2. 


Answers will vary. 

Sign test > z- or f-test 

Paired-sample sign test — t-test 
Wilcoxon signed-rank test — ¢-test 
Wilcoxon rank sum test — z- or f-test 
Kruskal-Wallis test > one-way ANOVA 


Spearman rank correlation coefficient — Pearson 
correlation coefficient 


Review Answers for Chapter 11 (page 641) 


1. 


3. 


(a) Ho: median =< 650 (claim); H,: median > 650 

(b)2  (c) 7 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to reject the bank manager’s claim that 


the median number of customers per day is no more 
than 650. 


(a) Hp: median = 2 (claim); H,: median # 2 
(b) -—1.645 (c) —3.26 (d) Reject Ap. 
(e) There is enough evidence at the 10% level of 


significance to reject the agency’s claim that the median 
sentence length for all federal prisoners is 2 years. 
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5. 


11. 


13. 
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(a) Hp: There is no reduction in diastolic blood pressure. 
(claim) 
H, There is a reduction in diastolic blood pressure. 
(b)2 (c) 3 (d) Fail to reject Ap. 
(ce) There is not enough evidence at the 5% level of 


significance to reject the claim that there was no 
reduction in diastolic blood pressure. 


. (a) Independent; Wilcoxon rank sum test 


(b) Ho: There is no difference in the total times to earn 
a doctorate degree by female and male graduate 
students. 


H,: There is a difference in the total times to earn 
a doctorate degree by female and male graduate 
students. (claim) 


(c) £2.575  (d) —1.357 (or 1.357) 

(e) Fail to reject Hp. 

(f) There is not enough evidence at the 1% level of 
significance to support the claim that there is a 


difference in the total times to earn a doctorate degree 
by female and male graduate students. 


. (a) Ho: There is no difference in the ages of doctorate 


recipients among the fields of study. 
H,: There is a difference in the ages of doctorate 
recipients among the fields of study. (claim) 

(b) 9.210 (c) 6.741 = (d) Fail to reject Hp. 

(e) There is not enough evidence at the 1% level of 
significance to conclude that the distributions of ages 
of the doctorate recipients in these three fields of study 
are different. 

(a) Ho: ps = 0; H,: ps # 0 (claim) 

(b) 0.786 = (c) 0.830 (d) Reject Ap. 

(ec) There is enough evidence at the 5% level of 
significance to conclude that there is a correlation 
between overall score and price. 

(a) Ho: The traffic stops were random by gender. 

H,; The traffic stops were not random by gender. 
(claim) 

(b) lower critical value = 8 
upper critical value = 19 

(c) 14 (d) Fail to reject Hp. 

(e) There is not enough evidence at the 5% level of 


significance to support the claim that the stops were 
not random by gender. 


Chapter Quiz for Chapter 11 (page 645) 


1. 


(a) Ho: There is no difference in the hourly earnings. 
H,: There is a difference in the hourly earnings. (claim) 
(b) Wilcoxon rank sum test 
(c) 41.645 (d) —3.326 (or 3.326) (e) Reject Hp. 
(f) There is enough evidence at the 10% level of 
significance to support the organization’s claim that 


there is a difference in the hourly earnings of union 
and nonunion workers in state and local governments. 


2. (a) Hp: median = 52 (claim); H,: median # 52 


(b) Sign test (c) £1.96 (d) —2.75  (e) Reject Ap. 
(f) There is enough evidence at the 5% level of 


significance to reject the organization’s claim that the 
median number of annual volunteer hours is 52 hours. 


3. (a) Ho: There is no difference in the sales prices among the 


regions. 
H,: There is a difference in the sales prices among the 
regions. (claim) 

(b) Kruskal-Wallis test 

(c) 11.345 (d) 25.957 (e) Reject Ap. 

(f) There is enough evidence at the 1% level of 


significance to conclude that the distributions of 
the sales prices in these regions are different. 


4. (a) Ho: The days with rain are random. 


H,: The days with rain are not random. (claim) 
(b) Runs test 
(c) lower critical value = 10 

upper critical value = 22 
(d) 16 (e) Fail to reject Hp. 


(f) There is not enough evidence at the 5% level of 
significance for the meteorologist to conclude that days 
with rain are not random. 


5. (a) Ho: p, = 0; H,: p, # 0 (claim) 


(b) Spearman rank correlation coefficient 

(c) 0.829 (d) 0.886 (e) Reject Hp. 

(f) There is enough evidence at the 10% level of 
significance to conclude that there is a correlation 


between the number of larceny-thefts and the number 
of motor vehicle thefts. 


Real Statistics-Real Decisions for Chapter 11 (page 646) 


1. (a) Answers will vary. 


(b) Answers will vary. 


(c) Answers will vary. 


2. (a) Answers will vary. 


(b) Sign test; You need to use the nonparametric test 
because nothing is known about the shape of the 
population. 

(c) Hp: median = 4.1; H,: median < 4.1 (claim) 

(d) Fail to reject Hp. There is not enough evidence at the 
5% level of significance to support the claim that the 
median tenure for workers from the representative’s 
district is less than 4.1 years. 


3. (a) Wilcoxon rank sum test; You need to use the 


nonparametric test because nothing is known about the 
shape of the population. 


(b) Ho: There is no difference between the median tenures 
for male workers and female workers. 


H,: There is a difference between the median tenures 
for male workers and female workers. (claim) 
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(c) Fail to reject Hp. There is not enough evidence at the 8. There is not enough evidence at the 5% level of 
5% level of significance to support the claim that there significance to conclude that the distribution of how much 
is a difference between the median tenures for male parents intend to contribute to their children’s college costs 
workers and female workers. differs from the claimed or expected distributions. 
9. (a) 0.733; About 73.3% of the variation in height can be 
Cumulative Review for Chapters 9-11 (page 648) explained by the variation in metacarpal bone length; 
1. (a) , About 26.7% of the variation is unexplained. 
; ee ; . (b) 4.255; The standard error of estimate of the height 
2 pot for a specific metacarpal bone length is about 
: 18 * F 4.255 centimeters. 
&g oil = 2 (c) 168.026 < y < 190.83; You can be 95% confident that 
B27 the height will be between 168.026 centimeters and 
2 ILO Pie ial 190.83 centimeters when the metacarpal bone length 
2 call © - is 50 centimeters. 
= 104 10. There is enough evidence at the 10% level of significance 
Ae ne ae ae a a c= * to conclude that there is a correlation between the overall 
Men’s time (in seconds) score and the price. 


r = 0.815; strong positive linear correlation 
(b 


wm 


Reject Hp. There is enough evidence at the 5% level of 
significance to conclude that there is a significant linear 
correlation between the men’s and women’s winning 
100-meter times. 


(c) } = 1.264x — 1.581 


>< 


Women’s time (in seconds) 
2 
t 


tT t t t t > x 


f 
t 

9.6 9.8 10.0 10.2 10.4 10.6 10.8 
Men’s time (in seconds) 


(d) 10.93 seconds 


2. There is enough evidence at the 5% level of significance to 
support the agency’s claim that there is a difference in the 
weekly earnings of workers who are union members and 
workers who are not union members. 


3. There is not enough evidence at the 1% level of 
significance to reject the company’s claim that the median 
age of people with mutual funds is 50 years. 

4. There is enough evidence at the 10% level of significance 
to reject the claim that the mean expenditures are equal 
for all four regions. 

5. (a) 17,876.15 pounds per acre 
(b) 20,148.12 pounds per acre 


6. There is not enough evidence at the 10% level of 
significance to reject the administrator’s claim that the 
standard deviations of reading test scores for eighth grade 
students are the same in Colorado and Utah. 


7. There is enough evidence at the 1% level of significance 
for the representative to conclude that the distributions of 
annual household incomes in these regions are different. 
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INDEX 


A 


Addition Rule, 189 
for the probability of A 
and B, 157, 160 
alternative formula 
for the standardized test 
statistic for a proportion, 
402 
for variance and standard 
deviation, 96 
alternative hypothesis 
one-sample, 357 
two-sample, 430 
analysis of variance 
(ANOVA) test 
one-way, 574, 575 
two-way, 580 
approximating binomial 
probabilities, 284 
area of a region 
under a probability curve, 239 
under a standard normal 


curve, 239 
B 
back-to-back stem-and-leaf 
plot, 64 


Bayes’ Theorem, 154 
biased sample, 20 
bimodal, 67 
binomial distribution, 221 
mean of a, 210 
normal approximation to a, 
281 
population parameters of a, 
210 
standard deviation of a, 210 
variance of a, 210 
binomial experiment, 202, 540 
notation for, 202 
binomial probabilities, using 
the normal distribution to 
approximate, 284 
binomial probability 
distribution, 205, 221 
binomial probability formula, 204 
bivariate normal distribution, 517 
blinding, 18 
blocks, 18 
box-and-whisker plot, 102 
side-by-side, 111 
boxplot, 102 
modified, 112 


C 


calculating a correlation 
coefficient, 488 
categories, 540 
c-confidence interval 
for the population mean, 307 
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for the population 
proportion, 328 
cell, 551 
census, 3, 20 
center, 38 
central angle, 56 
Central Limit Theorem, 268 
chart 
control, 256 
Pareto, 57 
pie, 56 
time series, 59 
Chebychev’s Theorem, 87 
Chi-square 
distribution, 337 
goodness-of-fit test, 540, 542 
independence test, 553 
test 
finding critical values for, 404 
for independence, 553 
for standard deviation, 406, 
415 
test statistic for, 406 
for variance, 406, 415 
class, 38 
boundaries, 42 
mark, 40 
width, 38 
class limit 
lower, 38 
upper, 38 
classical probability, 132, 160 
closed question, 25 
cluster sample, 21 
clusters, 21 
of data, 71 
coefficient 
correlation, 487 
t-test for, 492 
of determination, 514 
of variation, 96 
combination 
of n objects taken r at a 
time, 170, 171 
complement of event E, 136 
complementary events, 160 
completely randomized 
design, 18 
conditional probability, 145 
conditional relative frequency, 
563 
confidence, level of, 305 
confidence interval, 307 
for o}/03, 573 
for the difference between 
means, 440, 450 
for the difference between 
two population 
proportions, 468 
for the mean of the 
differences of paired 
data, 460 


for a population mean, 
finding a, 307 
for a population proportion, 
constructing a, 328 
for a population standard 
deviation, 339 
for a population variance, 
339 
for slope, 523 
for y-intercept, 523 
confounding variable, 18 
constructing 
a confidence interval for the 
difference between 
means, 440, 450 
a confidence interval for the 
difference between two 
population 
proportions, 468 
a confidence interval for the 
mean of the differences 
of paired data, 460 
a confidence interval for the 
mean: t-distribution, 320 
a confidence interval for a 
population proportion, 
328 
a confidence interval for a 
population standard 
deviation, 339 
a confidence interval for a 
population variance, 339 
a discrete probability 
distribution, 192 
a frequency distribution 
from a data set, 38 
an ogive, 45 
a prediction interval for y 
for a specific value of x, 
S17 
contingency table, 551 
contingency table cells, finding 
the expected frequency 
for, 551 
continuity correction, 283 
continuous probability 
distribution, 236 
continuous random variable, 
190, 236 
control 
chart, 256 
group, 16 
convenience sample, 22 
correction, continuity, 283 
correction factor 
finite, 278 
finite population, 316 
correlation, 484 
correlation coefficient, 487 
Pearson product moment, 487 
Spearman rank, 625 
t-test for, 492 
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using a table for, 490 
counting principle, 
fundamental, 130, 171 
c-prediction interval, 517 
critical region, 376 
critical value, 305, 376 
in a normal distribution, 
finding, 376 
in a f-distribution, finding, 387 
cumulative frequency, 40 
graph, 44 
curve, normal, 236 


D 


data, 2 
qualitative, 9 
quantitative, 9 
data sets 
center of, 38 
paired, 58 
shape of, 38 
variability of, 38 
decision rule 
based on P-value, 364, 371 
based on rejection region, 378 
degrees of freedom, 318 
corresponding to the 
variance in the 
denominator, 565 
corresponding to the 
variance in the 
numerator, 565 
density function, probability, 236 
dependent 
event, 146 
random variable, 201 
sample, 428 
variable, 484 
descriptive statistics, 5 
designing a statistical study, 16 
determination, coefficient of, 514 
deviation, 81 
explained, 513 
total, 513 
unexplained, 513 
d.f.p, 565 
dfx, 565 
diagram, tree, 128 
discrete probability 
distribution, 191 
discrete random variable, 190 
expected value of a, 196 
mean of a, 194 
standard deviation of a, 195 
variance of a, 195 
distinguishable permutation, 170 
distribution 
binomial, 221 
binomial probability, 205, 221 
bivariate normal, 517 
chi-square, 337 
continuous probability, 236 
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discrete probability, 191 
F-, 565 
frequency, 38 
geometric, 218, 221 
hypergeometric, 224 
normal, 236 
finding critical values in, 376 
properties of a, 236 
Poisson, 219, 221 
sampling, 266 
standard normal, 239, Al, A2 
properties of, 239, A2 
t-, 318 
finding critical values in, 387 
uniform, 248 
dot plot, 55 
double-blind experiment, 18 
drawing a box-and-whisker 
plot, 103 


E 


e, 219 
effect 
Hawthorne, 18 
interaction, 580 
main, 580 
placebo, 18 
elements of well-designed 
experiment, 18 
empirical probability, 133, 160 
Empirical Rule (or 68-95-99.7 
Rule), 86 
equation 
exponential, 510 
logarithmic, 510 
multiple regression, 524 
of a regression line, 502 
power, 510 
error 
of estimate 
maximum, 306 
standard, 515 
margin of, 306 
of the mean 
standard, 266 
sampling, 20, 266, 306 
tolerance, 306 
type I, 359 
type II, 359 
estimate 
interval, 305 
point, 304 
pooled, of the standard 
deviation, 442 
standard error of, 515 
estimating p by minimum 
sample size, 331 
estimator, unbiased, 304 
event, 128 
complement of an, 136 
dependent, 146 
independent, 146, 160 
mutually exclusive, 156, 160 
simple, 129 
expected frequency, 541 
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finding for contingency 
table cells, 551 
expected value, 196 
of a discrete random 
variable, 196 
experiment, 16 
binomial, 202, 540 
double-blind, 18 
multinomial, 215,540 
probability, 128 
well-designed, elements of, 18 
experimental design 
completely randomized, 18 
matched-pairs, 19 
randomized block, 18 
experimental unit, 16 
explained deviation, 513 
explained variation, 513 
explanatory variable, 484 
exploratory data analysis 
(EDA), 53 
exponential equation, 510 


F 


factorial, 168 
false positive, 155 
F-distribution, 565 
finding areas under the 
standard normal curve, 
241, A4 
finding a confidence interval 
for a population mean, 307 
finding critical values 
for the chi-square test, 404 
for the F-distribution, 566 
in a normal distribution, 376 
in a ¢-distribution, 387 
finding the expected frequency 
for contingency table 
cells, 551 
finding the mean of a 
frequency distribution, 70 
finding a minimum sample size 
to estimate p, 310 
to estimate p, 331 
finding the P-value for a 
hypothesis test, 371 
finding the standard error of 
estimate, 515 
finding the test statistic for the 
one-way ANOVA test, 575 
finite correction factor, 278 
finite population correction 
factor, 316 
first quartile, 100 
five-number summary, 103 
formula, binomial probability, 
204 
fractiles, 100 
frequency, 38 
conditional relative, 563 
cumulative, 40 
expected, 541 
joint, 551 
marginal, 551 
observed, 541 


relative, 40 
frequency distribution, 38 
mean of, 70 
rectangular, 71 
skewed left (negatively 
skewed), 71 
skewed right (positively 
skewed), 71 
symmetric, 71 
uniform, 71 
frequency histogram, 42 
relative, 44 
frequency polygon, 43 
F-test for variances, 
two-sample, 568 
function, probability density, 236 
Fundamental Counting 
Principle, 130, 171 


G 


Gallup poll, 351 

gaps, 68 

geometric distribution, 218, 221 
mean of a, 224 
variance of a, 224 

geometric probability, 218 

goodness-of-fit test, chi-square, 

540, 542 

graph 
cumulative frequency, 44 
misleading, 64 


H 


Hawthorne effect, 18 
histogram 

frequency, 42 

relative frequency, 44 
history of statistics timeline, 33 
homogeneity of proportions 

test, 561 

hypergeometric distribution, 224 
hypothesis 

alternative, 357, 430 

null, 357, 430 

statistical, 357 
hypothesis test, 356 

finding the P-value for, 371 
hypothesis testing 

for slope, 523 

steps for, 365 

summary of, 414, 415 


independence test, chi-square, 
553) 
independent, 551 
event, 146, 160 
random variable, 201 
sample, 428 
variable, 484 
inferential statistics, 5 
inflection points, 236, 237 
influential point, 509 
inherent zero, 11 
interaction effect, 580 
interquartile range (IQR), 102 
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interval, c-prediction, 517 

interval estimate, 305 

interval level of measurement, 
11,12 

intervals, 38 


J 
joint frequency, 551 


K 


Kruskal-Wallis test, 619 
test statistic for, 619 


L 


law of large numbers, 134 
leaf, 53 
left, skewed, 71 
left-tailed test, 362 
for a population correlation 
coefficient, 492 
length of a run, 631 
level of confidence, 305 
level of significance, 361, 490 
levels of measurement 
interval, 11,12 
nominal, 10, 12 
ordinal, 10, 12 
ratio, 11,12 
limit 
lower class, 38 
upper class, 38 
line 
of best fit, 501 
regression, 490, 501 
linear transformation of a 
random variable, 201 
logarithmic 
equation, 510 
transformation, 510 
lower class limit, 38 


M 


main effect, 580 
making an interval estimate, 305 
margin of error, 306 
marginal frequency, 551 
matched samples, 428 
matched-pairs design, 19 
maximum error of estimate, 306 
mean, 65 
of a binomial distribution, 210 
difference between 
two-sample t-test for, 442 
two-sample z-test for, 431 
of a discrete random 
variable, 194 
of a frequency distribution, 70 
of a geometric distribution, 
224 
standard error, 266 
trimmed, 78 
t-test for, 389 
weighted, 69 
mean absolute deviation 
(MAD), 97 
mean square 


between, 574 
within, 574 
means, sampling distribution 
of sample, 266 
measure of central tendency, 65 
measurement 
interval level of, 11, 12 
nominal level of, 10, 12 
ordinal level of, 10, 12 
ratio level of, 11,12 
median, 66 
midpoint, 40 
midquartile, 111 
midrange, 78 
minimum sample size 
to estimate p, 310 
to estimate p, 331 
misleading graph, 64 
mode, 67 
modified boxplot, 112 
multinomial experiment, 215, 540 
multiple regression equation, 524 
Multiplication Rule for the 
probability of A and B, 
147, 160 
mutually exclusive, 156, 160 


N 


n factorial, 168 
negative linear correlation, 484 
negatively skewed, 71 
no correlation, 484 
nominal level of 
measurement, 10, 12 
nonlinear correlation, 484 
nonparametric test, 598 
normal approximation to a 
binomial distribution, 281 
normal curve, 236 
normal distribution, 236 
bivariate, 517 
finding critical values in, 376 
properties of a, 236 
standard, 239, Al, A2 
finding areas under, 241, A4 
properties of, 239, A2 
normal probability plot, A28 
normal quantile plot, A28 
notation for binomial 
experiment, 202 
null hypothesis 
one-sample, 357 
two-sample, 430 


O 


observational study, 16 
observed frequency, 541 
odds, 143 

of losing, 143 

of winning, 143 
ogive, 44 
one-way analysis of variance, 574 

test, 574, 575 

finding the test statistic 
for, 575 

open question, 25 
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ordered stem-and-leaf plot, 53 

ordinal level of measurement, 
10, 12 

outcome, 128 

outlier, 54, 68 


P 


paired data sets, 58 
paired samples, 428 
sign test, performing a, 602 
parameter, 4 
population 
binomial distribution, 210 
Pareto chart, 57 
Pearson product moment 
correlation coefficient, 487 
Pearson’s index of skewness, 97 
performing 
a chi-square goodness-of-fit 
test, 542 
a chi-square test for 
independence, 554 
a Kruskal-Wallis test, 620 
a one-way analysis 
of variance test, 576 
a paired-sample sign test, 602 
a runs test for randomness, 633 
a sign test fora 
population median, 599 
a Wilcoxon rank sum test, 612 
a Wilcoxon signed-rank test, 
609 
permutation, 168, 171 
distinguishable, 170, 171 
of n objects taken r at a 
time, 168, 171 
pie chart, 56 
placebo, 16 
effect, 18 
plot 
back-to-back stem-and-leaf, 64 
box-and-whisker, 102 
dot, 55 
normal probability, A28 
normal quantile, A28 
residual, 509 
scatter, 58, 484 
side-by-side box-and-whisker, 
111 
stem-and-leaf, 53 
point, influential, 509 
point estimate, 304 
for a, 337 
for 0”, 337 
for p, 327 
Poisson distribution, 219, 221 
variance of a, 224 
polygon, frequency, 43 
pooled estimate of the 
standard deviation, 442 
population, 3 
correlation coefficient 
using Table 11 for the, 490 
using the f-test for the, 492 
mean, finding a confidence 
interval for, 307 


parameters 
of a binomial 
distribution, 210 
proportion, 327 
constructing a confidence 
interval for, 328 
standard deviation, 82 
variance, 81, 82 
positive linear correlation, 484 
positively skewed, 71 
power equation, 510 
power of the test, 361 
principle, fundamental 
counting, 130, 171 
probability 
Addition Rule for, 157, 160 
classical, 132, 160 
conditional, 145 
curve, area of a region 
under, 239 
density function, 236 
empirical, 133, 160 
experiment, 128 
formula, binomial, 204 
geometric, 218 
Multiplication Rule for, 147, 
160 
rule, range of, 135, 160 
statistical, 133 
subjective, 134 
that the first success will 
occur on trial number x, 
218, 221 
theoretical, 132 
value, 361 
probability distribution 
binomial, 205, 221 
chi-square, 337 
continuous, 236 
discrete, 191 
geometric, 218, 221 
normal, properties of a, 236 
Poisson, 219, 221 
sampling, 266 
standard normal, 239 
probability plot, normal, A28 
properties 
of a normal distribution, 236 
of sampling distributions of 
sample means, 266 
of the standard normal 
distribution, 239, A2 
proportion 
population, 327 
confidence interval for, 328 
z-test for, 398 
sample, 279 
proportions, sampling 
distribution of sample, 279 
proportions test, homogeneity 
of, 561 
P-value, 361 
decision rule based on, 364, 
371 
for a hypothesis test, finding 
the, 371 
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Q 


qualitative data, 9 
quantile plot, normal, A28 
quantitative data, 9 
quartile, 100 

first, 100 

second, 100 

third, 100 
question 

closed, 25 

open, 25 


R 


random sample, simple, 20 
random sampling, 3 
random variable, 190 
continuous, 190, 236 
dependent, 201 
discrete, 190 
expected value of a, 196 
mean of a, 194 
standard deviation of a, 195 
variance of a, 195 
independent, 201 
linear transformation of a, 201 
randomization, 18 
randomized block design, 18 
randomness, runs test for, 632 
range, 38, 80 
interquartile, 102 
of probabilities rule, 135, 160 
rank correlation coefficient, 
Spearman, 625 
rank sum test, Wilcoxon, 611 
ratio level of measurement, 
11,12 
rectangular, frequency 
distribution, 71 
region 
critical, 376 
rejection, 376 
regression equation, multiple, 524 
regression line, 490, 501 
deviation about, 513 
equation of, 502 
variation about, 513 
rejection region, 376 
decision rule based on, 378 
relative frequency, 40 
conditional, 563 
histogram, 44 
replacement 
with, 21 
without, 21, 203 
replication, 19 
residual plot, 509 
residuals, 501 
response variable, 484 
right, skewed, 71 
right-tailed test, 362 
for a population correlation 
coefficient, 492 
tule 
addition, 157, 160 
decision 
based on P-value, 364, 371 
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based on rejection 
region, 378 
empirical, 86 
multiplication, 147, 160 
range of probabilities, 135, 160 
run, 631 
runs test for randomness, 632 


sy 


sample, 3 
biased, 20 
cluster, 21 
convenience, 22 
dependent, 428 
independent, 428 
matched, 428 
paired, 428 
random, 20 
simple, 20 
stratified, 21 
systematic, 22 
sample means 
sampling distribution for 
the difference of, 430 
sampling distribution of, 266 
sample proportion, 279 
sample proportions, sampling 
distribution of, 279 
sample size, 19 
minimum to estimate p, 310 
minimum to estimate p, 331 
sample space, 128 
sample standard deviation, 83 
for grouped data, 88 
sample variance, 83 
sampling, 20 
sampling distribution, 266 
for the difference of the 
sample means, 430 
for the difference between 
the sample proportions, 
461 
for the mean of the differences 
of the paired data entries 
in dependent 
samples, 451 
properties of, 266 
of sample means, 266 
of sample proportions, 279 
sampling error, 20, 266, 306 
sampling process 
with replacement, 21 
without replacement, 21 
scatter plot, 58, 484 
Scheffé Test, 586 
score, standard, 105 
second quartile, 100 
shape, 38 
side-by-side box-and-whisker 
plot, 111 
sigma, 39 
sign test, 598 
performing a paired-sample, 
602 
test statistic for, 599 
signed-rank test, Wilcoxon, 609 
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significance, level of, 361, 490 
simple event, 129 
simple random sample, 20 
simulation, 17 
skewed 
left, 71 
negatively, 71 
positively, 71 
right, 71 
slope 
confidence interval for, 523 
hypothesis testing for, 523 
Spearman rank correlation 
coefficient, 625 
standard deviation 
of a binomial distribution, 210 
chi-square test for, 406, 415 
confidence intervals for, 339 
of a discrete random 
variable, 195 
point estimate for, 337 
pooled estimate of, 442 
population, 82 
sample, 83 
standard error 
of estimate, 515 
of the mean, 266 
standard normal curve, finding 
areas under, 241, A4 
standard normal distribution, 
239, Al, A2 
properties of, 239, A2 
standard score, 105 
standardized test statistic, 
for a chi-square test 
for standard deviation, 
406, 415 
for variance, 406, 415 
for the correlation 
coefficient 
t-test, 492 
for the difference between 
means 
t-test, 452 
z-test, 431 
for the difference between 
proportions z-test, 462 
for a ¢-test 
for a mean 389, 415 
two-sample, 442 
for a z-test 
for a mean, 373, 415 
for a proportion, 398, 415 
two-sample, 431 
statistic, 4 
statistical hypothesis, 357 
statistical probability, 133 
statistical process control 
(SPC), 256 
statistical study, designing a, 16 
statistics, 2 
descriptive, 5 
history of, timeline, 33 
inferential, 5 
status, 2 
stem, 53 
stem-and-leaf plot, 53 


back-to-back, 64 
ordered, 53 
unordered, 53 
steps for hypothesis testing, 365 
strata, 21 
stratified sample, 21 
study 
observational, 16 
statistical, designing a, 16 
subjective probability, 134 
successes, population proportion 
of, 327 
sum of squares, 81-83 
sum test, Wilcoxon rank, 611 
summary 
of counting principles, 171 
of discrete probability 
distributions, 221 
five-number, 103 
of four levels of 
measurement, 12 
of hypothesis testing, 414, 415 
of probability, 160 
survey, 17 
survey questions 
closed question, 25 
open question, 25 
symmetric, frequency 
distribution, 71 
systematic sample, 22 


T 


table, contingency, 551 
t-distribution, 318 
constructing a confidence 
interval for the mean, 320 
finding critical values in, 387 
test 
chi-square 
goodness-of-fit, 540, 542 
independence, 553 
homogeneity of proportions, 
561 
hypothesis, 356 
Kruskal-Wallis, 619 
left-tailed, 362 
nonparametric, 598 
one-way analysis of 
variance, 574, 575 
paired-sample sign, 602 
power of the, 361 
for randomness, runs, 632 
right-tailed, 362 
Scheffé, 586 
sign, 598 
two-tailed, 362 
two-way analysis of variance, 
580 
Wilcoxon rank sum, 611 
Wilcoxon signed-rank, 609 
test statistic, 361 
for a chi-square test, 406, 415 
for the correlation coefficient, 
492 
for the difference between 
means, 452 
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for the difference between 
proportions, 462 
for the Kruskal-Wallis test, 619 
for a mean 
large sample, 373, 415 
small sample, 389, 415 
for a proportion, 398, 415 
for the runs test, 633 
for the sign test, 599 
for a two-sample t-test, 442 
for a two-sample z-test, 431 
for the Wilcoxon rank sum 
test, 612 
testing the significance of the 
Spearman rank correlation 
coefficient, 626 
Theorem 
Bayes’, 154 
Central Limit, 268 
Chebychev’s, 87 
theoretical probability, 132 
third quartile, 100 
time series, 59 
chart, 59 
timeline, history of statistics, 33 
total deviation, 513 
total variation, 513 
transformation, logarithmic, 510 
transformations to achieve 
linearity, 510 
transforming a z-score to an 
x-value, 259 
treatment, 16 
tree diagram, 128 
trimmed mean, 78 
t-test 
for the correlation 
coefficient, 492 
for the difference between 
means, 452 
for a mean, 389, 415 
two-sample 
for the difference between 
means, 442 
two-sample 
F-test for variances, 568 
t-test, 442 
z-test 
for the difference 
between means, 431 
for the difference 
between proportions, 
462 
two-tailed test, 362 
for a population correlation 
coefficient, 492 
two-way analysis of variance 
test, 580 
type I error, 359 
type I error, 359 


U 


unbiased estimator, 304 
unexplained deviation, 513 
unexplained variation, 513 


uniform, frequency 
distribution, 71 
uniform distribution, 248 
upper class limit, 38 
using 
the chi-square test for a 
variance or standard 
deviation, 406 
the normal distribution 
to approximate binomial 
probabilities, 284 
P-values for a z-test for 
a mean, 373 
rejection regions for a 
z-test for a mean, 378 
Table 11 for the 
correlation coefficient, 
490 
the f-test 
for the correlation 
coefficient p, 492 
for the difference between 
means, 452 
for a mean, 389 
a two-sample F-test to 
compare oj and 03, 568 
a two-sample -test for 
the difference between 
means, 443 
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a two-sample z-test 
for the difference 
between means, 431 
for the difference 
between proportions, 
462 
a z-test for a proportion, 
398 


V 


value 
critical, 305, 376 
expected, 196 
probability, 361 
variable 
confounding, 18 
dependent, 484 
explanatory, 484 
independent, 484 
random, 190 
continuous, 190, 236 
discrete, 190 
response, 484 
variability, 38 
variance 
of a binomial distribution, 210 
chi-square test for, 406, 415 
confidence intervals for, 339 


of a discrete random variable, 
195 
of a geometric distribution, 
224 
mean square 
between, 574 
within, 574 
one-way analysis of, 574 
point estimate for, 337 
of a Poisson distribution, 224 
population, 81, 82 
sample, 83 
two-sample F-test for, 568 
two-way analysis of, 580 
variation 
coefficient of, 96 
explained, 513 
total, 513 
unexplained, 513 


Ww 


weighted mean, 69 

Wilcoxon rank sum test, 611 
test statistic for, 612 

Wilcoxon signed-rank test, 609 

with replacement, 21 

without replacement, 21, 203 
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x 
x, random variable, 190 
Y 
y-intercept, confidence interval 
for, 523 
Zz 
zero, inherent, 11 
z-score, 105 
z-test 


for a mean, 373, 415 
test statistic for, 373, 415 
using P-values for, 373 
using rejection regions 
for, 378 
for a proportion, 398, 415 
test statistic for, 398, 415 
two-sample 
difference between 
means, 431 
difference between 
proportions, 462 
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